Any multithreading people out there? Multithreaded loading of a huge array.

BurntPizza · July 7, 2014, 7:39pm

@Agent

Reading this I will concede that (I had already seen the evidence earlier in the thread) reading many small chunks can be sped up with multithreading to give the drive controller more information about the requests, however looking at some benchmarks I’m seeing that sequential operation is about two orders of magnitude faster than random 4K operation, plural queue depth or not, which (assuming I read the numbers correctly) makes sense to me because sequential & contiguous is already optimal and optimizing an even semi-random request queue can only approach being sequential and contiguous, with the exception of the edge case of the random spread covering the whole section without overlap and the queue being perfectly optimized. This also assumes the consumer-grade drive has a sufficiently advanced controller, which is not guaranteed (although I expect would be common at least by now).

I’ve also never stated that SSDs have this “vulnerability,” I hope I didn’t come across as implying as such.

If you’re still reading this Ray, just reconsider if any of this is worthwhile, and if so go for compression first. IIRC Snappy and LZO have Java implementations. I expect largest and easiest gains are to be had there if you can’t reduce the bottom-line data quantity.

Rayvolution · July 7, 2014, 9:09pm

BurntPizza:

@Agent

Reading this I will concede that (I had already seen the evidence earlier in the thread) reading many small chunks can be sped up with multithreading to give the drive controller more information about the requests, however looking at some benchmarks I’m seeing that sequential operation is about two orders of magnitude faster than random 4K operation, plural queue depth or not, which (assuming I read the numbers correctly) makes sense to me because sequential & contiguous is already optimal and optimizing an even semi-random request queue can only approach being sequential and contiguous, with the exception of the edge case of the random spread covering the whole section without overlap and the queue being perfectly optimized. This also assumes the consumer-grade drive has a sufficiently advanced controller, which is not guaranteed (although I expect would be common at least by now).

I’ve also never stated that SSDs have this “vulnerability,” I hope I didn’t come across as implying as such.

If you’re still reading this Ray, just reconsider if any of this is worthwhile, and if so go for compression first. IIRC Snappy and LZO have Java implementations. I expect largest and easiest gains are to be had there if you can’t reduce the bottom-line data quantity.

Yep, still here. Sucking in all the various information in every direction.

I plan on trying many of the options and figuring out what works best. I did discover something odd; I broke my maps down into 8 files, 1 for each layer. So instead of saving 1 int[1024][1024][8] array, I’m saving 8 int[1024][1024] arrays. An odd thing happened when I did this: The total file size went from 3.2mb to 1.7mb for the exact same amount of data compressed the exact same way, just in 8 files instead of 1. Due to that, load time increased.

I attempted to multithread loading the 8 files with mixed success, mainly due to trying to have multiple instances of runnable() in the same class. It was choppy code, and I know how to fix it. I just haven’t quite yet. But, when it did load correctly, it loaded much faster.

Either way though, I’ve dropped map loading down to 1200ms~ on my desktop (from 1800ms~) due to the decreased file size. I couldn’t do any reasonable timed tests with the hacky multithreading code though, but I suspect it’ll load roughly twice as fast from the few times it managed to load correctly by dumb luck.

I’m probably going to try Snappy or LZO anyway though, even though my total load time is under 3 seconds on my desktop, it’s still a good 10-15 on my laptop… and that seems a bit high.

Riven · July 8, 2014, 6:58am

Backing up BurntPizza’s reply: for spinning disks (and to a lesser extent SSDs), pack your files, regardless of whether you throw compression into the mix. (think *.tar) Read the entire file in memory and extract your resources from there. You’ll be reading in your data at about 80MB/s on laptop HDDs, which will blow away any performance gain you get from multithreading (the worst case scenario of) random-access on spinning hardware, which might bump I/O bandwidth from 0.5 to 2 MB/s.

Once the data is in memory, you can unleash as many threads as you want on it, to potentially boost performance - whether it is significant depends on the nature of the data. If you use compression, aim for compression on the resource level, as opposed to the file level. This way you can multi-thread the decompression by loading different resources using multiple threads. It might increase the file size by a few percent, but that’s a proper tradeoff.