Server VM kicks Donkey; passers-by "not surpr

blahblahblahh · May 11, 2004, 1:05pm

EDIT: the title, as allowed by this piece-of-**** forum software, but then arbitrarily chopped off (obviously they can’t count to the same number twice…) was: “Server JVM kicks donkey; passers-by ‘not surprised’”

Disclaimer: this is not intended as a formal benchmark or recommendation; it’s a brief summary of a real-world deployment which I thought would interest people here (who already know why and how they’re using java).

We just did a full stress-test of our production builds of Expedition. This takes quite a while, so we only do the key parts frequently, and this full-suite much less often. The amount of statistical data gathered is huge, and extremely dependent on the equally large number of facts about what we’re deploying, how we’re deploying, etc. So I’m only going to post broad conclusions, because there’s nothing really useful in a middle ground between that and giving away everything.

Very basic configuration:

A variety of pentium 2 and 3 servers from 300 Mhz up to 2 Ghz
RAM installed varying from 256M up to 1G
Everything running linux (I’ve not seen the results from the windows tests…), various flavours (discrepancies between distros do exist, and in fact are sometimes quite major - we’ve even seen two distros produce completely different bytes on the wire! - but we treat these as bugs in our design, so they get excluded from the final results)
Everything running 1.4.2_04
Servers and clients connected via switched 100Mbit LAN with practically no other traffic (on their own LAN segment; possibly a little bit of background noise). However, clients are often artificially limited (downgraded) to under 10Mbit (many Expedition licensees are running on hosted/virtual servers with crappy 10Mbit cards). We have a couple of 12- to 24-port switches and they don’t seem to have any problems with keeping up

Applications:

Grex’s Expedition server (a mid-range game-server for running everything from 100-player games up to 10k-player-per-server MMOG’s)
Straight-forward HTTP 1.1 server implemented inside Expedition
A mixture of example games accessed by simple automated clients (no rendered game; game-FSM is hard-coded into the client - these are not “playable” games :)).
A mixture of tiny/trivial requests and responses (e.g. one-smiley chat messages) which tend to have headers 20-40 times the size of the payload (!), and medium requests and responses of around 30k-100k (e.g. transfer of image-files, new textures, new levels, new sound-clips, etc)

Optimizations:

The core Expedition classes are optimized quite heavily, BUT running in instrumentation-mode (i.e. collecting and storing performance-monitoring data, which adds a large constant overhead)
Game-server code is “lazily” optimized: e.g. workarounds for all known major performance bugs in Sun’s NIO are merged in, but many known-optimizations that would reduce code clarity (or just be annoyingly time-consuming to implement) are not present. The most obvious example is that we’re hardly using direct disk-to-network IO (it’s unusual to want to do that anyway; in a gameserver you’re much more likely to RAM-cache all disk data).

Conclusions:

Client JVM is actually very good at being a server…assuming you only have a maximum of 15-20 simultaneously connected clients. It settles down to approximately 95% of max performance (througput / average response-time) within 30-60 seconds.
…with many more than 10 simultaneous clients (e.g. 100, 200, etc) the client JVM takes literally 5-10 minutes to settle down to just 75% of max performance, and it really struggles even to get that far. Effectively, a client JVM with anything more than 25 threads is only going to run at 80% of the throughput of a client JVM with 5 threads
Server JVM typically doubles the response-time for all requests (!) for the first few minutes.
There are a couple of interesting and very obvious inflection points in server JVM performance (e.g. response times rise sharply then suddenly reverse acceleration almost instantly and start dropping…both at consistent rates) presumably equating to key moments in optimization (start/end of new phases of optimization?)
With several hundred clients, within a few minutes of startup, the server JVM is increasing throughput at an impressive rate, and it only takes it a couple of minutes to catch up with the client JVM for the same number of threads…only it doesn’t stop there. Where the client JVM peaked, the server JVM just charges headlong through the 80% of 5-thread throughput, then through 90%, and only really starts to slowdown at around 95%.
…so within 10 minutes of starting a stress test, the server JVM is usually providing hundreds of simultaneous clients exactly the same level of performance as a 5-client client JVM.

Notes:

OF COURSE this is all hearsay and random data.
…the reason it’s interesting is that there’s a complete stack with every part of the server present - there’s no “placeholder” code or articial “test-harness”
…also, although the low-level code (e.g. expedition core classes) are pretty well optimized, the higher-level code is increasingly less optimized, and may not be using the low-level code as well as it could. The higher-level parts deliberately contain lots of under-optimized code - i.e. typical of a real game project where you can’t usually afford to hand-optimize everything (and it may take you 12 months of using the system before you begin to fully appreciate exactly how to use the system to the max anyway)
there are approximately 30 threads running on the server. It uses NIO exclusively for these tests, so the number of threads only varies slightly with the number of simultaneous connected clients
Don’t underestimate the client JVM! For small games, it’s excellent…
The Expedition server is quite a big beast with lots of stuff going on internally. This is no micro-benchmark - this is a “system stress-test”, not very useful for optimization, but useful as a sanity check to be sure that when you connect all your parts together they don’t somehow conflict and burn away all your performance. That’s why you don’t (need to) run it very often: for the most part, multiple-unit unit-level stress-tests are what tell you where and what you need to optimize. (that’s a gross generalization, but…)