I have downloaded original C nvidia demo and done some comparisons.
Jogl version - 8 and 6 fps (VAR and non-var)
Nvidia version - 30 and 15 fps (VAR and non var)
Removing rendering from jogl version does not help much - I’m at 10-12 fps max. It seems that performance kill comes from 6 FloatBuffer puts inside innermost loop. Commenting them out sends performance through the roof - into 40-50fps range (nothing is rendered, but this is same nothing as with commenting out DrawElements, which gives only 10-12 fps) - and this is with drawElements and rest of loop in place.
There used to be discussion about it some time ago - and it turned out that hotspot is not smart enough to discover that given buffer is directly mapped to memory and thus going through some major hops to access it.
Anyway, this is not really jogl problem, but java/hotspot problem. But I’m afraid that 3-4 times slowdown on any code which uses memory arrays directly (for speed…) is not acceptable.
I vaguely remember there was some trick to fix it - something like making DirectByteBuffer public and casting all buffers to it. Does anybody remember anything more, or have a link for that discussion ?
I’m posting it here, not on performance board, because it is most critical for jogl and shows in jogl ‘performance’ example. And it is jogl which will have major problems if it is not fixed…