Extremely variable performance with the JIT

Hi,

I’ve been tearing my hair out with some very wierd performance problems which I’m currently blaming on the JIT and was hoping was that someone might be able to shine some light on things.

Our Moviestorm application uses JOGL to render fairly complex scenes. We’re using shaders, our character models are pretty rich and there’s a lot of math to handle the animations and kinematics. The net result is that both the GPU and the CPU are pretty busy.

We’ve done various performance improvements recently (with much help from the wonderful NetBeans profiler) but kept coming across runs where the framerates varied wildly: one benchmark is a crowd scene with 20 characters varies from 4fps to 26fps!!

I currently suspect the JIT for several reasons:
() Setting the CompileThreshold really high mostly forces the framerate to be very low
(
) Choosing -server over -client with the same CompileThreshold gives 26fps vs 22fps
() Choosing JDK6 vs JDK5 gives 26fps vs 24fps
(
) Long load times frequently seem to result in low framerate

So I’m trying to get a handle on this - for our app we really need to guarantee that the user will see the high framerate - restarting it when it’s stuck in low is not an option.

(1) Is there a limit to how much gets compiled - possibly being consumed by the early load stuff before the real work starts?
(2) Does the JIT compile methods in a queue and wait to avoid stealing too many cycles from the app - resulting in latecomers being denied compilation?

Any other thoughts or suggestions for what we can do?

Thanks,
Dave

  1. no
  2. queued, yes, but I don’t think this is your problem

It might possibly be a memory alignment issue, which can have a surprisingly huge effect on framerates. It’s probably 50/50 chance whether something is properly aligned for you.

When you say the framerate varies do you mean it either sits at 26fps for the duration (or never gets above 4fps), or does it vary constantly?

Cas :slight_smile:

sure, a simple explanation could be found in a dynamic class instantiation: Class.forName(“yourclass”).newInstance();

newInstance will launch JIT, I think that JIT is only invoked when a class is used for the first time. so if you use a class that have not been used since your app was launched, JIT will be invoked.

EDIT:

[quote]Choosing JDK6 vs JDK5 gives 26fps vs 24fps
[/quote]
do you mean JRE or javac ? because JDK6 JRE seems to be faster

[quote]Choosing -server over -client with the same CompileThreshold gives 26fps vs 22fps
[/quote]
server VM is usually faster than client VM but it take a bit more time to initialise.

Welcome to the wonderful world of JIT-compilers. I’ve been smashing my head against/through the wall about this numerous times. Sometimes it’s like changing a p++ to a p+=1 that reduces performance by factor 2 or more. I usually spend some hours finding the crulpit and then trial-and-error my way out.

In the last few days it hit me again: Making a tiny change in the code, compiled it ran it, reverted, compiled it, ran it, increased the performance from 33fps to 83fps. The server VM is not predicatble at all, and the smallest changes (not even only those in bytecode, but other unknown? factors) can have the largest consequences.

So much for my rant, I can’t help you, I’m dealing with the exact same thing.

I moved some wildly random performing code into a native library. Yes. It’s a shame, but now I always get reliable speed.

Oh, before I forget… Lets assume you’re working with direct (native) FloatBuffers a lot. When there is ONE CALL to a non-native FloatBuffer, the VM reverts all it’s optimisations and can EASILY perform TEN TIMES slower from that point on. So… either only work with float[], or make sure ALL (100,00000%) your NIO-Buffers are native.

I once noticed this in a simple benchmark where I was timing the speed of various loops and found that having one benchmark before another altered the results dramatically. I checked I wasn’t doing anything silly, but apparently it has something to do with how the JIT optimises (and where it does not) - like a threshold to certain types of optimisations probably operating on the assumption that shorter methods, or only the inner most loop of a method with high cyclomatic complexity need optimisation and if you have two then don’t optimise.

Cas - it either ran at 4fps or 26fps for the duration. Occasionally (under unknown circumstances possibly involving minimising to desktop, possibly involving going and getting a cup of tea), occasionally it would transition from 4fps to 26fps which I put down to the JIT finally catching up.

Riven - nice to know I’m not the only one with JIT shaped bruises on my forehead. If I get any further with this I post a follow up here.

DzzD - I’ve been using the JDKs almost exclusively. If the JREs are different this just makes the nightmare worse :frowning:
Incidentally I don’t really notice the difference in init time between the server and client vms but that could be because the init step is quite slow (lots of models to load).

Thanks all,
Dave

Is this behaviour on -server or -client, or both?
Since -client and -server are quite different JIT’s, I’d suspect different behaviour between the two if the issue is indeed caused by the JIT.
Do you see a difference in the profiler results when it’s slow or fast?
How much heap do you make available to your app, and does the behaviour change when you change default/max heap size?
Is there a difference between starting the app from within the IDE (Eclipse?), or outside the IDE from the commandline or whatever?
I suppose you’re sure there’s nothing running in the background like a virus scanner that’s hogging things?

If there’s one thing that JIT compilers are, it’s consistent. They either work well or they don’t, given a particular dataset. So what’s happening is not a JIT vagiary, it’s something to do with the machine architecture, and I still suspect an alignment issue.

Cas :slight_smile:

I can run the exact same app a few times and have wildly varying performance.

FloatBuffers are always page-aligned (check the sourcecode), so where exactly would this poor alignment be caused?

I’m genuinly interested.

Check out variable allocation in Java; like I was saying even small orderings can make an effect.

I’m sorry to say, but that reference is so out of date (1997! Java 1.1…), it’s not funny. The example with the two sqrt’s is plain silly, and the other issues are currently handled by the JIT perfectly well.

[quote=“Riven,post:11,topic:30429”]
I found it quite funny :smiley:

is it really out of date ?.. or does the javabytecode is still generated this way ? it would be interesting to see if such difference still happend in javabytecode, i would expect tha the compiler will handle such things not the JIT, as you know JIT only convert bytecode to machine op code with some optimisation depending on the platform target.

There is quite some bloat in the bytecode. Javac is not really optimizing much (if anything at all). Often optimizations in the bytecode result in lesser JIT-generated native code. So you sholdn’t expect too much from hand-optimizing bytecode, although some more benchmarks should be nice on this one.

Silly that someone would write code that did 2 identical sqrt on consecutive lines?

There is one thing I would disagree with in the mentioned article, The final example is overly naive and misleading, and is certainly not a valid reason to move a loop counter outside of it’s correct scope.

erikd - the behaviour between the client and server jits is definitely different but nevertheless our app gets stuck at a low framerate with either.
I get the same behaviour within the ide (NetBeans in this case) and wrapped as a standalone binary.
We usually set the heap to 256M. It’s possible this changes the behaviour but hard to tell.
Your question about the profiler is interesting as that always has the low framerate even when I’ve excluded all of the time critical classes from being profiled - I’m guessing that the profiler disables the JIT?

cas - are you sure jits are completely repeatable? Surely in a multithreaded app, some methods may clock up their 1500 invocations sooner than others and then (and I’m guessing here) they may be filling the queue to the JIT. Where should I be looking for alignment issues? Aren’t nio buffers suitably aligned at allocation?

Thanks,
Dave

[quote]So you sholdn’t expect too much from hand-optimizing bytecode, although some more benchmarks should be nice on this one.
[/quote]
Optimizing byte code has no impact, in my experience (well, not with HotSpot).
I ran JEmu2 through ProGuard once, which reported it had done more than 3000 optimizations, but there was no noticable change in performance in the end result.
I guess it might be useful for J2ME, or simple interpreting JVM’s.

[quote]Your question about the profiler is interesting as that always has the low framerate even when I’ve excluded all of the time critical classes from being profiled - I’m guessing that the profiler disables the JIT?
[/quote]
I’m mainly using -Xprof to profile, and that definitely doesn’t disable the JIT.

Even in that area I have never noticed a perceivable performance improvement caused by bytecode optimisers alone.
Typically 80-90% of the game loop is spent rendering images anyway, so a 5% improvement in the performance of the game logic is largely unnoticed.

you could run your game with -XX:+PrintCompilation and see wether there is any difference between a fast and a slow run.
I really doubt that this has anything to do with the JIT compiler, because client and server-compiler are so different that its really unlikely the problem results from generated code or code generation itself. Client compiles so fast it should not be noticed at all.

Try to run with XProf and if that does not help profile with netbeans or some other profilers and bet you still get a slow case. however I guess profilers would slow down your game too much, so XProf is probably the best choice.

Performance of JRE and JDK is equal.

By the way you could try Excelsior-JET, a quite good java-to-native compiler. They have trial versions, it qould be quite interesting to see wether the problem happens there too…

Good luck, lg Clemens

I came up with a way to create a misaligned FloatBuffer.

ByteBuffer bb = …nicely…aligned…
bb.position(1);
FloatBuffer fb = bb.asFloatBuffer();

My sympathy for those who did, and thought nobody would ever know.