Ccoj5lhLmSQ
Ah, it’s not that complex
Let’s say you do a binary search, and find your cache to be trashed.
Now shrink the ordered data-set, by taking every n-th value.
Perform the binary search on that smaller data-set.
You now have an index at which (after scaling up) you can expect the value to be in the larger data-set.
Now do a binary search on the larger data-set within a very small range.
Rinse and repeat, until the smallest data-sets fits into L1-cache, and the second in L2-cache.
I don’t want no stinkin’ medals, I want competition! :-*
Nobody’s up for a challenge? Nobody? Surely your fingers must be itching.
I do this sort of thing all day long at work, and they pay me for it!
But as you’ve got a bit of spare time, how about you give us the raw speed of the naive search algorithm (that is, scan 4gb of sorted longs linearly in memory).
Cas
What I wouldn’t give for such a job! I’m integrating stuff into other stuff, and mapping stuff to stuff.
Meetings, presentations, politics, getting everybody to agree. Pays big bucks, drains the soul.
norm took 12564ms (100%, 667K ops/s) // <--- naive binarySearch, finding *lowest* index of value
rec1 took 8137ms (154%, 1030K ops/s)
rec2 took 5306ms (236%, 1580K ops/s)
rec3 took 4962ms (253%, 1690K ops/s)
rec4 took 4137ms (303%, 2027K ops/s) // <--- 4 layers of indirection
That’s still a binary search time in norm isn’t it? I mean literally this:
for (int i = 0; i < in.length; i ++) if (in[i] == findThis) return i;
Cas
Bah, only got 5GB physical RAM here at work. Can’t figure out a combination of commandline voodoo that will allow me to allocate 4GB.
Cas
Arg… I should learn to read. But then again, it’s a mad suggestion! :point:
Are you suspecting that the reduced cache-trashing of a linear search may be in the same ball-park as a cache-trashy binary search? Madness, I tell you!
I just added it, and it struggles with the ‘sanity-check’ already - not with correctness, but with performance. I’ll report back when my poor CPU finished the benchmark on the 512MB data-set, 4GB may take a few days.
Yes, sheer madness I wonder at what size of data-set it becomes realistically sensible to do brute-force.
Cas
[tr][td]
Data-set
[/td]
[td]
Linear ops/s
[/td]
[td]
Naive binary search ops/s
[/td]
[td]
Hierarchical binary search ops/s
[/td]
[/tr]
[tr]
[td]
512 MB
[/td]
[td]
50 ops/sec
[/td]
[td]
1,097,000 ops/sec
[/td]
[td]
2,702,000 ops/sec
[/td]
[/tr]
[tr]
[td]
4096 MB
[/td]
[td]
6 ops/sec
[/td]
[td]
667,000 ops/sec
[/td]
[td]
2,027,000 ops/sec
[/td]
[/tr]
I killed the process… would have taken eons.
Science! :o
Cas
The break even point for linear vs binary-search is:
drum roll
long[12]
norm took 3.757029ms (100%, 49906K op/s)
linr took 3.698955ms (101%, 50689K op/s)
private static int linearSearchLo(long[] data, long key) {
for (int i = 0; i < data.length; i++) {
if (data[i] == key) {
return i;
}
if (data[i] > key) {
return -(i + 1);
}
}
return -(data.length + 1);
}
I would hash it at sorting and I would get O(1) performance XD
Big-O notation is about performance degradation over a growing data-set. O(1) can be dreadful.
Hashing is not so usable for 1:N indexes. You’d need a fairly large data-structure behind every hash-key, and how do you optimize that?
Also, you can’t traverse (in-order) a hashed data-structure.
Anyway, I’ll give it a shot, for direct lookups (when time permits).
That wasn’t actually for serious but maybe there is some way in crazyness.
Depending on what the “important” operations are, there’s also bloom-filters.
Interesting concept.
There are no important operations though, as I don’t have a use-case yet where a factor 3 performance increase in memory access is actually meaningful. Whatever accesses this read-only ‘index’ is going to be the bottleneck.
I’d be curious as to where the break even point between hierarchical indices and something like binary search on level-ordered storage. Might not need as many levels in the hierarchy.
You know, believe or not, no. Absolutely not
You got this!
Have to admit that i can’t see where i would be using 4GB in this side of ram that is ordered so. I can see it with a mapped file or some old fashion spinning disks backing things however.
But i do find such microbenchmarks interesting. It keeps underlying one thing. Cache coherency is almost everything if aiming for performance.