or even LWJGL.
I don’t know the performance difference of JOGL and native OpenGL
Thanks
or even LWJGL.
I don’t know the performance difference of JOGL and native OpenGL
Thanks
The JavaOne slides from 2002, 2003 and 2004 on the JOGL home page discuss this issue. The Grand Canyon demo also discusses high-performance 3D graphics in Java using OpenGL.
Thanks, I don’t know if there are any bench marks.
And about Grand Canyon, I think it’s a little old tough we can learn much from it
“FastJOGL” Seems to perform quite faster than the normal JOGL. Any idea of what optimizations they made to the source?
The “fastjogl” renderer in Jake2 replaces immediate-mode calls to glVertex3f with uses of vertex arrays. Both use an unmodified JOGL library for rendering.
They have improved it by around 40% last time i made the same question in these forums. Its f* impressive that in some benchmark they were able to beat the C version. However this doesn’t prove that jogl can compete with a native application, well at least not to the outside general public. Quake 2 is a very old game that uses low poly models and low resolution textures.
It will be interesting to see how a modern game would stand up running with high res textures, normal-maps, shadows and everything a modern game offers. Unfortunatly there is no easy native opengl modern game that can be taken as reference and remaded in Java.
There is of course Doom 3 demo, thanks to the usual good will of JC:
And the game formats used in Doom3 are all well know by the community.
The Quake3 Squareheads thing pretty much knocks it on the head.
Bottom line: LWJGL and JOGL are so close to being as fast as native code that it’s quite hard to actually measure the difference. Any slowness is really a result of the JVM not being quite as fast as pure native code.
Cas
Vertex Arrays are still immediate mode, for the maximum speed one should use Vertex Buffer Object.
Due to the high overehead of JNI calls, glVertex calls will be the bottleneck, especially for high polygon count. For example I can render a 1 million happy budday model at 14-21fps ( no triangle strips used) with VBO mode while with glVertex I can only render at 1.2fps on my 5700LE.
However, according to my experience of various benchmarks, JOGL can only reach about 60% of the corresponding C++ code.
For example C++ code can render a 176 million triangle model at 55fps on 9700 pro, while JOGL can only reach 34 fps ( but not the same model and no triangle strips used both)
Er, no. Vertex arrays != immediate mode. VBOs are an extention to vertex arrays that lets you have tigher control over where they sit in memory, thats about it.
So you used a different model? How can you tell it’s not the model that did the difference. Also, how did you render the models in your menchmarks? It should be roughly the same if you use VBO with only a few state changes. Then virtually nothing is done by the java code. Although I think LWJGL can have a smaler overhead than JOGL since it can run single threaded.
We have done a few apples-to-apples performance comparisons between C++ and Java over the years and some of these are documented in the JavaOne slides on http://jogl.dev.java.net/ . The VertexArrayRange demo on the jogl-demos page was originally ported from C++ and on some machines and processors with SSE2 support the Java version is faster than the C++ version, because HotSpot dynamically generates SSE2 floating-point code while the C++ compiler is typically forced to generate x87 floating-point code for compatibility with more processors. We’ve also had experience porting skinning code from C++ to Java where the Java version was 85% of the speed of the optimized C++ code and faster than the debug version of the C++ code.
When using LWJGL’s AWTGLCanvas, LWJGL and JOGL have exactly the same amount of multithreading and context management overhead. In LWJGL’s full-screen mode there may be slightly less overhead, though the Java2D team is working on improving the JDK’s full-screen support to be more like “real fullscreen” in Mustang which should eliminate these differences. As far as I’ve seen, the multithreading overhead is insignificant for everything except microbenchmarks.
Thanks Ken for such important informations.
And about Tang, I don’t agree with you, I think VA is a tight pack of vertex and other data, and every frame these data need to be transferred to the video card. However, VBO’s not, VBO will reside in the video card after the glBufferDataARB ( in fact this will depends on the driver to decide where to put it)
So VBO will be much faster than VA !
Orangy is basically correct in what he says; VBO simply sits on top of VA, and specifies more explicitly where you want the RAM to reside based on your knowledge of how the data is to be used.
Cas
VA needs to transfer the buffer to VRAM every frame while VBO should prefer to be in the RAM. From the API you can say VBO sits on VA, but the performance differs large. This is extremely important for people like me who want to draw models of several millions triangles.
However according to my test, it seems that NV cards doesn’t put VBO in the VRAM after the size of total VBOs exceed 128M, even on a card with 256 VRAM installed, I don’t know where to ask for more informations about it. For with a 256M VRAM, models of 10million triangles or less can be held fully in the VRAM, while the FPS showed that it’s not the case. Both 6800GT and 5950Ultra have the problem.
I don’t have ATI card with 256M VRAM so I don’t know about it.
Yeah, which is what I was saying. I was disagreeing with your earlier line of:
[quote]Vertex Arrays are still immediate mode
[/quote]
Which is, quite frankly, bollocks.