Good to know you switched to caching (at 2 different abstraction levels) as a way to increase performance. Back in the day that wasn’t quite an option, as the sprite engine had to manage anything that was thrown in its general direction.
Merge-sort indeed scales pretty badly, as when merging multiple sorted datasets, you completely trash the cache, repeatedly, until you merged the last two sets. This overhead quickly outweighs the benefit of having your initial dataset split up and sorted by multiple threads.
I gather that the persistent sprite engine, featuring ‘unlimited’ decals, currently also resides in the eternal bit fields?