The slow bits in Java - which are the CPU bottlenecks - are the bits where you go traversing hundreds of thousands of little objects scattered all over memory in order to pack a simple bytebuffer with, say, 30,000 vertices. In C++ the hundreds of thousands of little objects would generally easily fit into a design using value classes embedded in each other, probably with various unions, and the whole thing would basically be a contiguous block of memory you’d traverse from start to finish, achieving pretty phenomenal performance.
In Java land, you will generally have followed the path of least resistance, and allocated a bunch of little objects all referencing each other in a nice easy to understand OOP design, which will then gradually get scattered all over memory and effectively accessed in an almost completely random manner after a very short while, leading to almost pathologically crap cache usage and slowing the CPU down to about a tenth of its potential speed. No amount of threading saves you from the horrors of the knackered cache - in fact it makes it worse.
And thusly, I conclude that despite only drawing 4000 sprites, I am actually CPU limited unless I rewrite everything from the ground up to use mapped objects in ByteBuffers. YMMV but you need to know this right at the start of your design. Unfortunately my stuff evolved from just over 10 years ago when the graphics card was indeed the bottleneck and it didn’t make any difference.
Cas 