Comparing Client VM and Server VM

Hi all,

I’m evaluating whether server VM is viable option for games, I observe long pauses over long time on my computer, but I suppose it’s mainly because it’s single core. I’ve prepared simple test using JBullet physics library, would be helpful if you could test it.

How to test:

  1. run Client VM version and press D key to disable deactivation
  2. wait until it stabilizes and tell me average FPS
  3. run Server VM version and press D key to disable deactivation
  4. wait and count how long it takes to stabilize and tell me average FPS
  5. does it still make visible pauses when shooting boxes (by right clicking) or by other manipulation?
  6. reply with information about CPU,VM,OS and your observations

My results (also a template for your replies):
CPU: Pentium 4 “Northwood” 2.4GHz, 32bit single-core without HyperThreading
VM: 1.6.0_07-b06
OS: Linux 32bit
Client FPS: 25
Server time until stabilized: about 8 seconds
Server FPS: 42
Server notes: after shooting box and colliding with other boxes it has very noticeable additional pauses

CPU: Core 2 Quad Q6600, 2.4GHz, 64bit quad-core
VM: 1.6.0_07-b06
OS: Windows XP 32bit
Client FPS: 540 (±20)
Server time until stabilized: no visible pauses, it takes 2 seconds for the FPS to get near its maximum
Server FPS: 820 (±20)
Server notes: FPS counter says 30, 520, 770 and then rises in about 10 seconds to the maximum rate. On client the FPS counter says 410, 540 and stays around there. So it takes 1 second (FPS counter’s refresh rate) for the server to reach the FPS of the client, another second to rise clearly above it, and some over 10 seconds to reach the maximum FPS.

CPU: Intel Core 2 Duo 2.2 GHz 32bit (I think)
VM: 1.5 Mac default
OS: Mac OS X 10.4.11

With Help Text still on
Client FPS: 60
Server time until stabilized: <2 seconds
Server FPS: 165
Server notes: Noticed no weird pauses, performance increased after shot box collided with the stack, reducing the number of contacts. After this FPS approached 220

With Help Off
Client FPS: 109
Server time until stabilized: <2 seconds (barely noticeable)
Server FPS: 300 +/- 10
Server notes: Same note as above, except after shooting into the stack, the FPS was around 420. If deactivation was still on, the FPS was around 560 before shooting any boxes.

The server jnlp gives me “Error: no ‘server’ JVM at ‘C:\Program Files\Java\jre1.6.0\server\jvm.dll’.”

CPU: Core 2 Duo, 2.4GHz
VM: 1.6.0_06
OS: Windows XP 32bit
Client FPS: ~600
Client notes: no noticeable pauses

  • edit, after copying server to jre:
    Server time until stabilized: < 3 seconds (no difference to client vm?)
    Server FPS: ~1000
    Server notes: no noticeable pauses

Regarding pauses: I noticed that the memory consumption fluctuates. This probably means you are creating too much garbage. Profile your application (or library) to see if the garbage collection causes the pauses. If yes, find the reason for the ‘costly’ garbage and reduce it. It’s common practice in Java game programming to reduce nasty garbage (but not every garbage is bad!) to calm down the garbage collector.

You can enable it by copying it from private JRE contained in JDK to public JRE, in other words copy $JDK_HOME/jre/bin/server directory to $JRE_HOME/bin/ (so you’ll end up having client and server dirs there). You can remove it from the public JRE after you do the test.

Heap garbage production is pretty optimized in JBullet by object pooling. The rapid allocations observed when shooted box collides are no problem, it’s at good rate for HotSpot to manage without any fuss. The pauses in server VM are certainly from compilation.

I’ve modified the post above after copying the server vm to the jre.

No pauses though. Maybe the testcase is too small? With deactivation and without help text I get >2000 fps…

That’s good, thanks for testing, all of you thanks for testing :slight_smile: Still would be nice to see also AMD CPUs both single and dual/quad core.

The test is for pauses due to server VM compiler when under load (computing physics, not idling because of deactivation :slight_smile: ) and not much about absolute FPS values, though the 20-24x speedup from my computer is awesome :slight_smile:

I did also test on P4 “Prescott” 3.2GHz and it’s not much better than my computer (about 1.5x speedup if I remember correctly, and pauses due to compilation were only a bit less noticeable than on my computer).

Ran it on my machine @work:

CPU: Core2 Duo E6850 @ 3Ghz, 32 bit
VM: 1.6.0_05-b13
OS: Windows XP 32bit
Client FPS: 430
Server time until stabilized: about 2 seconds
Server FPS: 530
Server notes: no noticeable pauses

It seems interesting to me the difference between everyone’s performance, most notably my 1.5 jvm at around 300 fps versus irrisor’s 1.6 jvm at 1000 fps. My computer is a laptop, but the processors are similar. Perhaps there is a graphics bottleneck with the renderer and a better test case would be to run the simulator in the background and print fps readings out every so often. Then you’d be comparing only the physics engine and not how good each of our graphics cards are.

Another report from the Mac side of things, with a newer VM…

CPU: Intel Core 2 Duo @ 2.4 GHz
VM: Apple 1.6.0_04 (64 bit)
OS: OS X 10.5.4
Client/server: 380 fps (Apple’s 1.6 VM doesn’t make a real distinction anymore, I’m told)
Pauses: none in either mode

If I disable the info display, FPS jumps up to around 1000, though, which indicates to me that the bottleneck here is most definitely not the physics. 125 bodies is a bit light to really stress test things on .

Apart from lhkbob’s suggestion about running a console-only test, it’s also worth keeping in mind that what you likely want to be testing is a dynamic situation that never comes to rest - even with sleeping turned off, there’s still less computation when things settle because you’re almost definitely only doing a single position correction iteration per frame once warm-starting heats up, so if you get around to updating the test, an ideal one would be to constrain a bunch of objects in a location and add a motorized “mixer” to make sure they never settle down. For JBox2d I used to use the following demo for benchmarking (never got around to bringing it over to the new version, alas - I kind of liked it): http://www.jbox2d.org/demos/washingmachine.html

Thanks for testing, yes it is a simple test, just to see how long (and how noticeable) is stabilizing on the same workload and how noticeable are additional JIT compiles when choosing other codepaths in middle of simulation, eg. when you shoot a colliding box into others (presumably compiling some other methods not run before or not so often). The stabilization is major part of the test, since after that you can easily spot any pause, than when it’s done continuously.

For example, the world is doing some simulation and then some new situation appears (eg. new monster after opening a door), since new codepaths are choosen JIT compiler kicks in and makes pauses which is very bad. Client VM compiler is so fast that it’s not visible at all, server VM on older HW can make pauses for even seconds. On the other hand server VM can yeild much better performance for some tasks (I see major improvement in Static Concave Mesh JBullet demo when you enable terrain animation by G key).

The test is only about whether server VM compilation overhead is noticeable or not. It seems that multicore CPUs are ok with no pause (or well hidden one), which confirmed my theory. It also showed that even client VM is very performant on newer CPUs (in comparison with older HW).

The complete form with graphics and stuff was choosen for good visualisation of the pauses, which are best observed with animated graphics representation.

That makes sense, just wanted to make sure you were getting measurements of what you want to measure! It also brings up a good issue - does anyone have the slightest idea what the average hardware configuration is for people playing Java games? jezek2, if I had to guess, most players will probably have boxes a bit more powered up than the one you’re testing on, but I’m really not sure by how much. I’m not sure that they’re really even selling many single-core machines anymore, but I could be wrong. I’d love to see some stats if anyone has them, particularly ones relevant to people that play Java games…

If I get a chance later tonight, I’ll give this stuff a run on my girlfriend’s Vista laptop, which I’m pretty sure is slightly slower than mine (it better be, she paid less than half of what I did!), as well as my decaying old Windows box at home, which is a 2 Ghz single core with a crap graphics card and < 1 Gb RAM, and what I usually use for low end testing. I think my dad also has a mid-range AMD desktop, which might be worth checking out if I can get over there.

Also, have you tried things on Windows? I know there is a Linux JVM from Sun, but Linux is notorious for being stuck with sub-par graphics drivers and all of that, and even with a single core machine I’m a bit surprised that in a single threaded application you’re seeing upwards of a 10x speed cut compared to the rest of us…maybe the Core 2 Duos really are that much faster? I just don’t know.

Have you thought about a pre-warming strategy or something like that? You can likely get most of your methods compiled during the “Downloading resources…” phase of a game if you run something in the background rather than just waiting. I’d imagine if you really exercised the most common tasks, like shooting, running, jumping, etc., along with the physics engine, you’d have most of your stuff compiled before the game even started.

I would like to achieve smooth playing experience even on older computers, I know that some people in my target audience have even slightly worse computers than mine.

About stats, you can look into Valve’s HW survey. I think that it doesn’t make sense to ask for players who plays Java games, as it’s too general. Better to ask what HW have players who plays causal games, FPS, MMORPGs, etc.

Nice, I’m looking forward for the results, thanks :slight_smile:

Yes, I’ve tried on P4 “Prescott” 3.2GHz on Windows 2000, FPS were only about 1.5x-2x better and server VM pauses were just only slighly less visible than on my computer, still unacceptable. Linux performance and driver quality is very good when you have NVIDIA card :slight_smile: I observe the same speed on both W2K and Linux on the same machine.

That’s pretty messy hack, I don’t want to take this route, better to stick with Client VM for me then. Also, if I’m not wrong, compilation is driven by method invocation count, so some methods that are not so often called will still get compiled over time and causes pauses, probably not that big but still noticeable.

BTW this resulted from observing that JBullet port is about 50% of speed of native C++ Bullet library, and client is even worse. Which is pretty sad and driven me to thoughts how feasible server VM is for interactive (semi-realtime?) stuff like games.

I had the chance to test it with a special version of JRockit! They have a vm parameter to reduce the pauses the vm causes. I ran it with
java -jrockit -Xgcprio:deterministic -XpauseTarget:30ms
Comparing it to the Sun VM:

JRockit time until stabilized: ~ 14 seconds
JRockit FPS: ~800
JRockit notes: no noticeable pauses

At least for jesek2 it could be interesting to try this vm to see if it reduces the pauses…

Does the pauseTarget affect GC only or also compilation? I have no problem with GC, as nearly everything is object pooled.

Interesting, thanks for test, would be better to test it on machine with noticeable additional pauses though :slight_smile: Is JRockit freely available? Doesn’t look like that.

BTW, I’ve been thinking more about the compilation, and it’s also likely that part of pause is because of running interpreted code for awhile before compilation, which means big slowdown for that method. I’m leaning towards the idea that tiered compiler could fix that.

are you sure that a gc happened in the time you tested? I am asking because I also tried JRockit some time ago and initially thought this would be the solution for realtime apps everything ran very smooth even if it was slightly slower than the Sun server VM etc… until the gc kicked in (6-23s to cleanup time for a 1gig heap in my testcase).
But I haven’t tested the deterministic gc because I had no license for it.

I had similar observations like jezek2, the compiler causes most slowdowns in the first 30s in a typical 3d app (I always use -server so I can’t tell if its better with the client compiler). GC is in many cases no problem because it is almost always possible to tweak gc stops to be <100ms every >5s what makes it almost not noticeable.