Fastest binary search on 4GB worth of long[]

jonjava · April 7, 2016, 7:50pm

Ccoj5lhLmSQ

Riven · April 7, 2016, 7:53pm

Ah, it’s not that complex

Let’s say you do a binary search, and find your cache to be trashed.
Now shrink the ordered data-set, by taking every n-th value.
Perform the binary search on that smaller data-set.
You now have an index at which (after scaling up) you can expect the value to be in the larger data-set.
Now do a binary search on the larger data-set within a very small range.

Rinse and repeat, until the smallest data-sets fits into L1-cache, and the second in L2-cache.

Riven · April 8, 2016, 8:27am

I don’t want no stinkin’ medals, I want competition! :-*

Nobody’s up for a challenge? Nobody? Surely your fingers must be itching.

princec · April 8, 2016, 8:53am

I do this sort of thing all day long at work, and they pay me for it!

But as you’ve got a bit of spare time, how about you give us the raw speed of the naive search algorithm (that is, scan 4gb of sorted longs linearly in memory).

Cas

Riven · April 8, 2016, 9:08am

What I wouldn’t give for such a job! I’m integrating stuff into other stuff, and mapping stuff to stuff.
Meetings, presentations, politics, getting everybody to agree. Pays big bucks, drains the soul.


norm took 12564ms (100%, 667K ops/s) // <--- naive binarySearch, finding *lowest* index of value
rec1 took 8137ms (154%, 1030K ops/s)
rec2 took 5306ms (236%, 1580K ops/s)
rec3 took 4962ms (253%, 1690K ops/s)
rec4 took 4137ms (303%, 2027K ops/s) // <--- 4 layers of indirection

princec · April 8, 2016, 9:13am

That’s still a binary search time in norm isn’t it? I mean literally this:


for (int i = 0; i < in.length; i ++) if (in[i] == findThis) return i;

Cas

princec · April 8, 2016, 9:20am

Bah, only got 5GB physical RAM here at work. Can’t figure out a combination of commandline voodoo that will allow me to allocate 4GB.

Cas

Riven · April 8, 2016, 9:21am

Arg… I should learn to read. But then again, it’s a mad suggestion! :point:

Are you suspecting that the reduced cache-trashing of a linear search may be in the same ball-park as a cache-trashy binary search? Madness, I tell you!

I just added it, and it struggles with the ‘sanity-check’ already - not with correctness, but with performance. I’ll report back when my poor CPU finished the benchmark on the 512MB data-set, 4GB may take a few days.

princec · April 8, 2016, 9:35am

Yes, sheer madness I wonder at what size of data-set it becomes realistically sensible to do brute-force.

Cas

Riven · April 8, 2016, 9:37am

[tr][td]
Data-set
[/td]
[td]
Linear ops/s
[/td]
[td]
Naive binary search ops/s
[/td]
[td]
Hierarchical binary search ops/s
[/td]
[/tr]

[tr]
[td]
512 MB
[/td]
[td]
50 ops/sec
[/td]
[td]
1,097,000 ops/sec
[/td]
[td]
2,702,000 ops/sec
[/td]
[/tr]

[tr]
[td]
4096 MB
[/td]
[td]
6 ops/sec
[/td]
[td]
667,000 ops/sec
[/td]
[td]
2,027,000 ops/sec
[/td]
[/tr]

I killed the process… would have taken eons.

princec · April 8, 2016, 9:39am

Science! :o

Cas

Riven · April 8, 2016, 10:05am

The break even point for linear vs binary-search is:

drum roll


long[12]

norm took 3.757029ms (100%, 49906K op/s)
linr took 3.698955ms (101%, 50689K op/s)


	private static int linearSearchLo(long[] data, long key) {
		for (int i = 0; i < data.length; i++) {
			if (data[i] == key) {
				return i;
			}
			if (data[i] > key) {
				return -(i + 1);
			}
		}
		return -(data.length + 1);
	}

blobloblob · April 8, 2016, 10:07am

I would hash it at sorting and I would get O(1) performance XD

Riven · April 8, 2016, 10:09am

Big-O notation is about performance degradation over a growing data-set. O(1) can be dreadful.

Hashing is not so usable for 1:N indexes. You’d need a fairly large data-structure behind every hash-key, and how do you optimize that?

Also, you can’t traverse (in-order) a hashed data-structure.

Anyway, I’ll give it a shot, for direct lookups (when time permits).

blobloblob · April 8, 2016, 10:20am

That wasn’t actually for serious but maybe there is some way in crazyness.

Roquen · April 8, 2016, 10:22am

Depending on what the “important” operations are, there’s also bloom-filters.

Riven · April 8, 2016, 10:49am

Interesting concept.

There are no important operations though, as I don’t have a use-case yet where a factor 3 performance increase in memory access is actually meaningful. Whatever accesses this read-only ‘index’ is going to be the bottleneck.

BurntPizza · April 8, 2016, 1:05pm

I’d be curious as to where the break even point between hierarchical indices and something like binary search on level-ordered storage. Might not need as many levels in the hierarchy.

Cero · April 8, 2016, 2:29pm

You know, believe or not, no. Absolutely not
You got this!

delt0r · April 8, 2016, 9:21pm

Have to admit that i can’t see where i would be using 4GB in this side of ram that is ordered so. I can see it with a mapped file or some old fashion spinning disks backing things however.

But i do find such microbenchmarks interesting. It keeps underlying one thing. Cache coherency is almost everything if aiming for performance.