Sorry if the question has already been asked. Any numbers, charts, live tests/demos?
Before I jumped into LWJGL, I ran a few tests with NeHe Tutorials in both c++ and LWJGL. I used Fraps as an external method of recording the fps. I found that Java/LWJGL performed within +/- 10% of c++. However, the tutorials noticably took longer to load in Java. I think this is because LWJGL requires files to be loaded into a higher order data type (buffers) rather than a primitive array. While notable, the difference in load times was not significant enough for me to think much of it. Though, this could be because the tutorials didn’t load much data. I expect the loadtime gap to grow as the amount of loaded media does.
It’s important to note that I optimized the Java tutorials to minimize work on the garbage collector. Without this, Java simulations lost to c++ by a significant amount (-40%!!). I wouldn’t worry too much about this- the tutorials were written with c++'s memory management in mind, not Java’s.
Here’s some optimization techniques:
-
Don’t use thread sensitive objects. I.E. use ArrayList rather than Vector unless you have to worry about the synchronization of the add/remove methods. If your game logic is sound, this won’t be an issue. Especially considering the fact that most games arn’t multithreaded anyway.
-
Lessen your number of “on instance” objects. For example:
Don’t do this:
class Swapper{
public static void swap(Object a, Object b){
Object temp; // Hi, I'm an "on instance" variable!
temp = a;
a = b;
b = temp;
}
}
Do this instead:
class Swapper{
private static Object temp;
public static void swap(Object a, Object b){
temp = a;
a = b;
b = a;
}
}
This might not seem like much, but consider the scenario of calling the swap() method 100s of times a frame. That’s 100 new Objects and 100 dead Objects. If you remove the “on instance” variable, you remove a potentially huge workload for the garbage collector. BTW… I made up the name “on instance” variable - is there a formal name for this technique?
- Avoid casting. Don’t use an ArrayList/Vector to hold your game’s entities unless you’re storing a small number. The computation time it takes to cast an object out of an ArrayList is a huge performance hit. Instead, create a custom collection class that emulates the effects of an ArrayList.
[quote]However, the tutorials noticably took longer to load in Java.
[/quote]
This is the time it takes for the VM to start up. Once the VM is running, the difference in speed is negligible as long as it follow what you said and try not to create objects on the fly and keep references until levels are complete.
[quote]3) Avoid casting. Don’t use an ArrayList/Vector to hold your game’s entities unless you’re storing a small number. The computation time it takes to cast an object out of an ArrayList is a huge performance hit.
[/quote]
Nah, it’s not a huge performance hit. There is a performance hit, but it’s relatively small when using the server JVM.
One thing to keep in mind when using LWJGL (or JOGL) is to try to minimize the number of direct ByteBuffers created each frame. These buffers are not allocated on the JVM heap and creating many buffers can reduce performance significantly. Cache small buffers and reuse them instead of creating new ones. If your app is single threaded it’s often enough to allocate one FloatBuffer, one IntBuffer etc. which you reuse all the time. You can monitor your app in Windows task manager. If your memory usage increases rapidly and then falls again, you are probably allocating too many direct ByteBuffers.
Casting in Java is a pretty trivial operation, as is instanceof (unlike C++ where the same operations are non-trivial). I’ve used arraylists/hashmaps/etc. for entities since forever and never seen casting show up on any profiling done. (And I belive the newer VMs do funky optimisations to eliminate the casts when you’ve got all the same type in a container).
Equally, Funkapotamus’ “on instance” advice is nonsense as well. That doesn’t create hundreds of new objects, just references. Since the references will be on the stack the allocation and deallocation will be practically free.
Your not creating any objects here. They are only created when you use the “new” keyword. The temp variable is only an alias. In the first example it may even be optemized to a register by the jit. I discourage the use of the second method as it is not thread safe.
But I agree with your point. Avoid “newing” objects. Although some will argue that it don’t mather with the new garbage collectors, I’m still sceptical and try to flatline the memory useage.
As for the differance between LWJGL and C. Since one is a game library the other is a progamming language I’m not sure what your asking. You will have the JNI overhead for every native call to Opengl. But even in C you want to minimise the number of gl calls due to the overhead of calling a dll function. So done right there is no big differance.
The Java vs C speed comparisons have been done many times before. Just mentioning it is flamebate, so I won’t comment on it.
tom,
what I’m simply asking is do we a demo that has 2 implementations: one in DirectX (as an example) and the other with LWJGL? This way we could benchmark both implementations and see what’s the real performance difference.
As princec mentionned multiple times, LWJGL is supposed to be very performing but how does it compare for a game with a C implentation (with DirectX for example)?
Well, it wouldn’t be entirely fair.
Take a trivial example with lots of gl calls - the C stuff should be at least 20% faster because it doesn’t incur the JNI overhead.
However if you change all of this to use display lists, then it’s probably within 5-10%.
My tests of glGears - using a straight port of the c code showed a ~20% performance decrease. Disabling background input check and all sorts of other stuff lwjgl does for you brought this closer - about 10% I think. I’ll do a new benchmark tonight, including one with JOGL (it performed horrendously last time I tried, but this was obviously because of some bug in the implementation with a lot of context switches).
[quote]tom,
what I’m simply asking is do we a demo that has 2 implementations: one in DirectX (as an example) and the other with LWJGL? This way we could benchmark both implementations and see what’s the real performance difference.
As princec mentionned multiple times, LWJGL is supposed to be very performing but how does it compare for a game with a C implentation (with DirectX for example)?
[/quote]
I think that we’ve come to the conclution that LWJGL is comparable to a similar program written in c++ and using OpenGL. Given that, the demo you’re asking for is basically a comparison between DirectX and OpenGL. At this point in time, both APIs are pretty much the same in terms of speed. It all comes down to your personal preference and intended platform. (For example, DirectX is Windows only for all practical purposes.)
Thanks guys for catching my mistakes in my suggestions. I should have clarified the examples a bit more to something along the lines of what tom wrote.
As near as dammit makes no difference at all.
Cas
[quote]As near as dammit makes no difference at all.
Cas
[/quote]
Well it might depends but I don’t know. Does 10% means that in the C++ implementation the game would run at 85 fps and in the LWJGL one at ~ 77?
No, 10% means a crappy little microbenchmark would run at 1000fps in C and 900fps in Java, but in a real game you will see almost no difference.
Cas
Any half-way complex game will be limited by your graphics card speed (be that either fill-rate (especially for a 2d game) or poly count). And that graphics card speed isn’t going to be affected by C vs. LWJGL.
If you’re not limited by fill rate (or whatever) you need to add more sprites! ;D
Maybe.
If the game is fillrate limited, than it’s likely that the LWJGL version will also run at 85fps.
If the game is CPU limited, then you just don’t do enough gl calls for getting the full impact of the JNI overhead.
Square heads runs at about 60fps on 5 year old hardware. That’s good enough in my book. I mean… I’m not going to make Quake4 or something like that. I won’t get old enough to finish a project like that in my whole life.
There just isn’t a point in spending 2-4 more time coding for getting (maybe) 10-15% more performance.
After all, after another 18 months in development, the entire target market will have doubled in processing power…
Cas
A new CPU generation every 18 months, a new GPU generation every 6 months… I can’t even imagine how powerful will be our computer in a near future !
Chman