Terrible VertexArrayRange performance

I have downloaded original C nvidia demo and done some comparisons.

Jogl version - 8 and 6 fps (VAR and non-var)
Nvidia version - 30 and 15 fps (VAR and non var)

Removing rendering from jogl version does not help much - I’m at 10-12 fps max. It seems that performance kill comes from 6 FloatBuffer puts inside innermost loop. Commenting them out sends performance through the roof - into 40-50fps range (nothing is rendered, but this is same nothing as with commenting out DrawElements, which gives only 10-12 fps) - and this is with drawElements and rest of loop in place.

There used to be discussion about it some time ago - and it turned out that hotspot is not smart enough to discover that given buffer is directly mapped to memory and thus going through some major hops to access it.

Anyway, this is not really jogl problem, but java/hotspot problem. But I’m afraid that 3-4 times slowdown on any code which uses memory arrays directly (for speed…) is not acceptable.

I vaguely remember there was some trick to fix it - something like making DirectByteBuffer public and casting all buffers to it. Does anybody remember anything more, or have a link for that discussion ?

I’m posting it here, not on performance board, because it is most critical for jogl and shows in jogl ‘performance’ example. And it is jogl which will have major problems if it is not fixed…

where’s the original code?
I’d like to test this with LWJGL …

Oops, I was going to paste the code here, but the forum complains the message is too long.

Anyway, the file is VertexArrayRange.java, in the jogl/demos/vertexArrayRange dir of the CVS: http://jogl.dev.java.net/source/browse/jogl/demos/vertexArrayRange/

Maybe this thread is of some interest? (old, but the only one I found :-/)

http://www.java-gaming.org/discus/messages/27/1400.html?1025552520

Anders - thanks for the pointer. This helped me to discover what is happening.

In short - java1.4.2b1 sucks.

Longer story - when I have run VertexArrayRange with java 1.4.0_something, I have 30 fps. It might be something like 29.5 compared to 30.5 with nvidia version, but difference is neglible. With -client instead of -server, fps is 26-27 - still acceptable.

It seems that java1.4.2b1 buffer access has suffered from serious regression.

Raw data for you:

[tr][td]JVM[/td][td]Float array[/td][td]Direct buffer[/td][/tr]
[tr][td]1.4.0_01, client[/td][td]81.1[/td][td]175.2[/td][/tr]
[tr][td]1.4.2b1, client[/td][td]69.1[/td][td]848.2[/td][/tr]
[tr][td]1.4.0_01, server[/td][td]13.0[/td][td]13.0[/td][/tr]
[tr][td]1.4.2b1, server[/td][td]8.1[/td][td]880.2[/td][/tr]

While normal array performance has improved considerably, direct buffer was killed. Especially look at server - from 13 to 880 !!!

You can get test program from link provided by Anders - use my corrected version to get above results. Program will claim that unit is uS, but I’m not sure about it - anyway, it is some kind of time per unit of work, not important how much it is absolutely, important is that it is computed in same way in both cases and can be compared in relative manner.

I can look at bugparade and file a performance regression bug, but then it will be good if it will get fixed in 1.5… Maybe some of Sun insiders here can check what is really happening and bug one of Hotspot engineers about that ?

Edit:
Cas, would it be possible that you test this code with jet ? Of course, you would need to rerun it with normal java, to get base result on your computer - but I wonder how well jet is able to optimize this stuff.

Edit:
http://nwn-j3d.sourceforge.net/misc/BufferMark.java
Download and run. I have tried to find number of iterations which is still meaningful under 1.4 and bearable to wait for under 1.4.2b1.

http://developer.java.sun.com/developer/bugParade/bugs/4851535.html

It tells exactly about out Vertex test problem.

Evalutation claims factor 5-6 perfomance slowdown, I see 80 times slowdown, but rest seems ok. They say they have already fixed it, so I guess we can be happy now.

The fix will show up in the next beta of 1.4.2.

After downloading jdk1.4.2 I have rechecked both demo and benchmark. 1.4.2 is quite faster than 1.4.0 at array access and a bit slower for buffer access - but we are taking about 5-10% slower here, not about 80x. VertexArrayBuffer difference is about 0.5fps (1.4.0 is better) - but still witing range of C++ version.

I guess this bug got fixed indeed - thanks.