Performance of JNI with lots of OpenGL calls

Recently there was a small benchmark on http://graphcomp.com/opengl/bench.html
I thought that it might be fun do a quick conversion to java, but if I had known the results before I maybe wouldn’t have.
The c version prints something like:

FBO Texture Rendering FPS: 239.646497
Teapot Shader FPS: 570.048602
Frame overhead secs/frame: 0.000044
OS/GLUT overhead secs/frame: 0.000011
Overall FPS: 167.167671

The JOGL (JOGL 1.1 rc 3, JDK 6) version returns:
FBO Texture Rendering FPS: 208.672087
Teapot Shader FPS: 153.128346
Frame overhead secs/frame: 0.000035
OS/GLUT overhead secs/frame: 0.000584
Overall FPS: 83.737661

Obviously using GLUT functions and using immediate mode isn’t the best performing way of woking with OpenGL today, but I think the benchmark might have a purpose for showing the overhead on JNI. Is that expected that JNI adds that much overhead (I had expected something like 10-20% overhead)?
Would you have expected such a large difference?
If you want to run the benchmark or check my conversion (after all - maybe there’s a bug somewhere) see the attached file.

If you are anywhere serious about performance, you make so little calls to OpenGL each frame, that the JNI-overhead becomes insignificant.

VertexArrays beat the crap out of immediate mode
VertexBufferObjects beat the crap out of VertexArrays
Cache-optimized indexed, interleaved VBOs beat the crap out of basicly anything.

So what’s the use of immediate-mode these days?
A bit of debugging… and getting a quick result, while being too lazy to care about performance.

We did fairly extensive experiments a couple of years ago with the Jake2 Quake II port to Java and were not able to isolate JNI overhead as being a culprit on any modern CPU. See this thread for a discussion. My guess would be either that inefficiencies were introduced during your translation of the benchmark, or that it’s a microbenchmark not representative of any real-world application. We have had good success in past years in translating several moderately-sized C++ demos and animation engines to Java and have uniformly achieved 90-100% of the speed of C, and sometimes even better than 100% because HotSpot generates machine code specific to the processor you’re currently running on while many C binaries are compiled for a least-common-denominator processor.

Ken, you were absolutely right. I took the Java GLUT code and ported it back to c and got different results:

Java (1st iteration)
FBO Texture Rendering FPS: 237.485172
Teapot Shader FPS: 164.152181
Frame overhead secs/frame: 0.000020
OS/GLUT overhead secs/frame: 0.000587
Overall FPS: 91.658273

Java (2nd iteration)
FBO Texture Rendering FPS: 240.625000
Teapot Shader FPS: 165.317919
Frame overhead secs/frame: 0.000024
OS/GLUT overhead secs/frame: 0.000356
Overall FPS: 94.478528

C
FBO Texture Rendering FPS: 244.326587
Teapot Shader FPS: 162.869442
Frame overhead secs/frame: 0.000023
OS/GLUT overhead secs/frame: 0.000012
Overall FPS: 97.392490

Actually these results are quite impressive! Although the code is creating incredibly many OpenGL calls it’s very close to the C version.
It’s just too strange that the teapot function made the difference - I even compared your GLUT version with some version found on google code. Apparently my installed version of GLUT uses a simpler method to render the teapot.