Performance drop java5 client VM?

In my emulator project, I notice a huge performance drop when using the client VM from java 1.5.0_03 compared to 1.4.2_03.
For example 1 game runs at 40 FPS on 1.5.0_03 , while the same game runs at 145 fps on 1.4.2_03 (both running without throttling or vsync or anything; both are just running flat out).

Has anyone experienced something like this too? Are there known problems with the 1.5 client VM?
This difference is just too extreme… :-\

Now before someone starts, since JEmu2 is a large and very cpu intensive application with a 2MB+ source, I wouldn’t classify it as a microbenchmark issue. :wink:

Profile output 1.4.2_03:


Flat profile of 27.67 secs (1728 total ticks): main

  Interpreted + native   Method                        
  7.5%     0  +   130    org.lwjgl.opengl.Win32ContextImplementation.nSwapBuffers
  4.3%     0  +    75    org.lwjgl.opengl.GL11.nglTexImage2D
  4.2%     0  +    72    java.nio.Bits.copyFromByteArray
  2.4%     0  +    42    org.lwjgl.opengl.Win32Display.destroyWindow
  1.4%     0  +    24    org.lwjgl.opengl.Win32Display.switchDisplayMode
  1.3%     1  +    22    jef.cpu.Z80.<clinit>
  0.8%     0  +    14    org.lwjgl.opengl.GL11.nglClear
  0.3%     0  +     5    org.lwjgl.opengl.Win32ContextImplementation.nMakeCurrent
  0.2%     0  +     4    org.lwjgl.opengl.Win32Display.getAvailableDisplayModes
  0.2%     0  +     3    org.lwjgl.opengl.GL11.nglBegin
  0.2%     0  +     3    org.lwjgl.opengl.Win32DisplayPeerInfo.nDestroy
  0.2%     0  +     3    org.lwjgl.opengl.Win32PeerInfo.nChoosePixelFormat
  0.2%     0  +     3    org.lwjgl.opengl.Win32ContextImplementation.nDestroy
  0.2%     3  +     0    org.lwjgl.opengl.Win32ContextImplementation.swapBuffers
  0.2%     0  +     3    cottage.drivers.Blktiger.<init>
  0.1%     0  +     2    org.lwjgl.opengl.Win32Display.nCreateWindow
  0.1%     2  +     0    java.nio.DirectIntBufferU.put
  0.1%     2  +     0    net.movegaga.jemu2.FrameWork.draw
  0.1%     2  +     0    net.movegaga.jemu2.FrameWork.run
  0.1%     2  +     0    org.lwjgl.opengl.GL11.glTexImage2D
  0.1%     1  +     1    org.lwjgl.opengl.GL11.glTexParameteri
  0.1%     0  +     2    jef.cpu.Z80.exec
  0.1%     0  +     2    java.lang.Class.forName0
  0.1%     0  +     2    jef.cpu.Z80.prefix_BC
  0.1%     0  +     1    org.lwjgl.opengl.Win32Display.destroyMouse
 28.4%    35  +   455    Total interpreted (including elided)

     Compiled + native   Method                        
 15.6%   270  +     0    jef.video.BitMapImpl.toBitMap
  8.6%   148  +     0    jef.video.BitMapImpl.setPixel
  8.0%   138  +     0    jef.cpu.Z80.exec
  7.8%   135  +     0    net.movegaga.jemu2.FrameWork.updateTexture
  4.5%    77  +     0    jef.video.BitMapImpl.toBitMap
  1.6%    28  +     0    jef.map.MemoryReadAddress$MEMread.read
  1.4%    24  +     0    jef.video.GfxManager.decode
  1.2%    21  +     0    jef.cpuboard.BasicCpuBoard.read8opc
  0.9%    16  +     0    vtable chunks
  0.3%     6  +     0    jef.cpu.Z80.execXY
  0.3%     5  +     0    jef.cpuboard.BasicCpuBoard.read8
  0.3%     5  +     0    cottage.vidhrdw.Blktiger.video_update
  0.2%     4  +     0    jef.cpuboard.BasicCpuBoard.read8arg
  0.2%     4  +     0    java.math.BigInteger.mulAdd
  0.2%     3  +     0    jef.cpu.Z80.checkInterrupt
  0.2%     3  +     0    jef.cpu.Z80.djnz_n
  0.1%     2  +     0    jef.video.GfxManager.getTile
  0.1%     2  +     0    jef.machine.BasicMachine.doFrame
  0.1%     2  +     0    jef.cpu.Z80.ld_A_ni
  0.1%     2  +     0    sun.security.provider.SHA.computeBlock
  0.1%     1  +     1    jef.util.RomLoader.loadFromWeb
  0.1%     1  +     0    jef.cpuboard.BasicCpuBoard.write8
  0.1%     1  +     0    jef.video.BitMapImpl.setPixelFast
  0.1%     1  +     0    cottage.drivers.Blktiger$Bankread.read
  0.1%     1  +     0    java.nio.DirectIntBufferU.get
 53.4%   921  +     2    Total compiled (including elided)

         Stub + native   Method                        
  7.3%     0  +   127    org.lwjgl.opengl.Win32ContextImplementation.nSwapBuffers
  4.7%     0  +    82    org.lwjgl.opengl.GL11.nglTexImage2D
  2.5%     0  +    44    java.nio.Bits.copyFromByteArray
  1.1%     0  +    19    org.lwjgl.opengl.GL11.nglClear
  0.2%     0  +     3    java.lang.StrictMath.pow
  0.2%     0  +     3    org.lwjgl.opengl.GL11.nglTexParameteri
  0.2%     0  +     3    java.lang.Throwable.fillInStackTrace
  0.2%     1  +     2    org.lwjgl.opengl.GL11.nglVertex3f
  0.1%     0  +     2    java.util.zip.Inflater.inflateBytes
  0.1%     0  +     2    org.lwjgl.opengl.GL11.nglBegin
  0.1%     0  +     1    org.lwjgl.opengl.Win32Display.readMouse
  0.1%     0  +     1    org.lwjgl.opengl.Win32Display.pollMouse
 16.8%     1  +   289    Total stub

  Thread-local ticks:
  0.5%     9             Class loader
  0.2%     4             Interpreter
  0.7%    12             Compilation


Flat profile of 0.90 secs (58 total ticks): DestroyJavaVM

  Thread-local ticks:
100.0%    58             Blocked (of total)


Global summary of 28.58 seconds:
100.0%  1830             Received ticks
  2.3%    43             Received GC ticks
  1.3%    23             Compilation
  0.5%     9             Class loader
  0.2%     4             Interpreter

profile output 1.5


 Flat profile of 238.72 secs (6500 total ticks): main

  Interpreted + native   Method                        
  1.8%     0  +   115    org.lwjgl.opengl.GL11.nglTexImage2D
  1.1%     0  +    71    java.nio.Bits.copyFromByteArray
  0.8%     0  +    55    org.lwjgl.opengl.Win32ContextImplementation.nSwapBuffers
  0.6%     0  +    38    org.lwjgl.opengl.Win32Display.destroyWindow
  0.4%     0  +    23    org.lwjgl.opengl.Win32Display.switchDisplayMode
  0.3%    22  +     0    jef.cpu.Z80.<clinit>
  0.2%     0  +    14    org.lwjgl.opengl.GL11.nglClear
  0.1%     0  +     7    org.lwjgl.opengl.Win32ContextImplementation.nMakeCurrent
  0.1%     0  +     4    org.lwjgl.opengl.Win32ContextImplementation.nDestroy
  0.1%     0  +     4    org.lwjgl.opengl.Win32Display.getAvailableDisplayModes
  0.0%     2  +     1    java.lang.ClassLoader.defineClass1
  0.0%     0  +     3    org.lwjgl.opengl.Win32DisplayPeerInfo.nInitDC
  0.0%     3  +     0    java.nio.DirectIntBufferU.put
  0.0%     0  +     3    org.lwjgl.opengl.Win32DisplayPeerInfo.nDestroy
  0.0%     0  +     3    org.lwjgl.opengl.Win32PeerInfo.nChoosePixelFormat
  0.0%     2  +     0    net.movegaga.jemu2.FrameWork.input
  0.0%     2  +     0    org.lwjgl.opengl.GL11.glEnable
  0.0%     0  +     2    cottage.drivers.Blktiger.<init>
  0.0%     2  +     0    net.movegaga.jemu2.FrameWork.draw
  0.0%     2  +     0    org.lwjgl.input.Mouse.poll
  0.0%     2  +     0    net.movegaga.jemu2.FrameWork.run
  0.0%     2  +     0    cottage.vidhrdw.Blktiger.video_update
  0.0%     2  +     0    jef.util.Statistics.frame
  0.0%     2  +     0    com.sun.imageio.plugins.png.PNGImageReader.decodeSubFilter
  0.0%     0  +     2    org.lwjgl.opengl.Win32Display.nCreateWindow
  7.2%   104  +   365    Total interpreted (including elided)

     Compiled + native   Method                        
 31.1%  2016  +     0    jef.video.BitMapImpl.toBitMap
  9.0%   583  +     0    net.movegaga.jemu2.FrameWork.updateTexture
  8.9%   576  +     0    jef.cpu.Z80.exec
  6.6%   428  +     0    jef.video.BitMapImpl.toBitMap
  1.6%   104  +     0    jef.cpu.Z80.execXY
  1.4%    92  +     0    jef.cpu.Z80.cp_n
  1.3%    84  +     0    jef.cpu.Z80.ei
  0.9%    56  +     0    jef.cpu.Z80.ld_A_ni
  0.4%    28  +     0    jef.cpu.Z80.djnz_n
  0.4%    28  +     0    cottage.vidhrdw.Blktiger.video_update
  0.4%    26  +     0    jef.cpuboard.BasicCpuBoard.write16fast
  0.4%    25  +     0    java.lang.ThreadLocal.get
  0.4%    24  +     0    jef.video.GfxManager.decode
  0.2%    15  +     0    jef.machine.BasicMachine.doFrame
  0.2%    13  +     0    jef.video.GfxManager.drawTile
  0.2%    12  +     0    jef.cpuboard.BasicCpuBoard.read16arg
  0.2%    12  +     0    jef.cpu.Z80.pop_BC
  0.2%    10  +     0    java.lang.AbstractStringBuilder.append
  0.1%     9  +     0    jef.cpu.Z80.ld_DE_nn
  0.1%     8  +     0    jef.cpu.Z80.prefix_BC
  0.1%     7  +     0    jef.cpu.Z80.ld_ea_ind8
  0.1%     6  +     0    jef.cpu.Z80.ld_HL_nn
  0.1%     6  +     0    jef.cpu.Z80.ld_HL_ni
  0.1%     6  +     0    jef.cpu.Z80.out_n_A
  0.1%     5  +     0    jef.cpu.Z80.prefix_ED
 65.6%  4247  +     6    Total compiled (including elided)

         Stub + native   Method                        
 11.5%     0  +   748    org.lwjgl.opengl.GL11.nglTexImage2D
  7.4%     0  +   479    java.nio.Bits.copyFromByteArray
  4.4%     0  +   286    org.lwjgl.opengl.Win32ContextImplementation.nSwapBuffers
  1.6%     0  +   101    org.lwjgl.opengl.GL11.nglClear
  0.4%     0  +    26    org.lwjgl.opengl.GL11.nglBegin
  0.4%     0  +    25    org.lwjgl.opengl.GL11.nglVertex3f
  0.2%     0  +    16    org.lwjgl.opengl.Win32Display.pollMouse
  0.2%     0  +    13    org.lwjgl.opengl.GL11.nglEnd
  0.2%     0  +    12    org.lwjgl.opengl.Win32Display.pollKeyboard
  0.1%     0  +     8    sun.misc.Unsafe.copyMemory
  0.1%     0  +     7    org.lwjgl.opengl.GL11.nglColor4ub
  0.1%     0  +     7    org.lwjgl.opengl.Win32Display.nUpdate
  0.1%     0  +     5    org.lwjgl.opengl.GL11.nglTexCoord2f
  0.1%     0  +     4    java.lang.StrictMath.pow
  0.1%     0  +     4    org.lwjgl.opengl.GL11.nglBindTexture
  0.0%     0  +     3    org.lwjgl.opengl.GL11.nglRotatef
  0.0%     0  +     3    org.lwjgl.opengl.GL11.nglPopMatrix
  0.0%     0  +     3    org.lwjgl.opengl.GL11.nglDisable
  0.0%     0  +     2    org.lwjgl.opengl.Win32Display.didMaximize
  0.0%     0  +     2    org.lwjgl.opengl.GL11.nglTexParameteri
  0.0%     0  +     2    org.lwjgl.opengl.Win32Display.readMouse
  0.0%     0  +     1    org.lwjgl.opengl.GL11.nglEnable
  0.0%     0  +     1    org.lwjgl.opengl.GL11.nglGetError
  0.0%     0  +     1    java.lang.Throwable.fillInStackTrace
  0.0%     0  +     1    org.lwjgl.opengl.GL11.nglNormal3f
 27.1%     0  +  1760    Total stub

  Thread-local ticks:
  0.2%    13             Blocked (of total)
  0.1%     5             Class loader


Flat profile of 1.37 secs (82 total ticks): AWT-Shutdown

  Thread-local ticks:
 98.8%    81             Blocked (of total)
100.0%     1             Unknown: no last frame


Flat profile of 1.06 secs (64 total ticks): AWT-EventQueue-0

  Thread-local ticks:
 98.4%    63             Blocked (of total)
100.0%     1             Unknown: no last frame


Flat profile of 238.81 secs (6532 total ticks): AWT-Windows

  Interpreted + native   Method                        
100.0%     0  +  6531    sun.awt.windows.WToolkit.eventLoop
100.0%     0  +  6531    Total interpreted

  Thread-local ticks:
  0.0%     1             Blocked (of total)


Flat profile of 0.01 secs (1 total ticks): AWT-Shutdown

  Thread-local ticks:
100.0%     1             Blocked (of total)


Flat profile of 0.88 secs (56 total ticks): DestroyJavaVM

  Thread-local ticks:
100.0%    56             Blocked (of total)


Global summary of 239.61 seconds:
100.0%  6580             Received ticks
  0.4%    24             Received GC ticks
  0.0%     2             Compilation
  0.1%     5             Class loader
  0.0%     2             Unknown code


The changes in the client compiler between 1.4.2 and 1.5 were relatively minor and I wouldn’t expect to see a performance drop of this magnitude unless the compiler is bailing out of a compilation in 1.5 that it wasn’t in 1.4.2. Unfortunately the implementation of -Xprof changed considerably in 1.5 and its output can’t be trusted as much as in 1.4.2; among other things, it slows down the application considerably rather than by 10% or so as in the previous release. Can you manually instrument your application to narrow down the slowdown more? Is the difference all contained in jef.video.BitMapImpl.toBitMap()?

BTW, are you comparing the speed of normal runs of 1.4.2 and 1.5, or -Xprof runs of the two JDKs? If the latter, that would probably explain the slowdown…

Thanks, I wasn’t aware that -Xprof changed to cause this.
I did indeed compare the 2 runs with -Xprof enabled. During development of the emu, I usually run with -Xprof enabled.
You were right that the extreme performance drop disappeared (although 1.4 still runs slightly faster, in the magnitude of 135 versus 145 fps).

So as of 1.5, running with -Xprof will cause a huge performance drop, while you can’t trust it’s output that much anymore? :-
Sounds like a bug to me… (How am I supposed to find performance problems now when I can’t trust the profiler?)

Got a bug Id# handy? Is this -Xprof issue fixed in the latest Mustang betas?

I didn’t find a bug in the bug database, so I submitted a bug report. I’ll post an update as soon as I have a bug id.

Thanks for filing the bug. Email me in a week or two if you don’t get any response to it. This has been a longstanding issue in how -Xprof was reimplemented in 1.5 but there has been some pushback from the original authors of the code to changing it for higher performance. The NetBeans profiler is a good tool which can provide you much finer-grained and higher-performance profiling than -Xprof; the only downside is that it requires more setup. The collector and analyzer in Sun Studio 10 are also top-notch Java profilers with relatively low overhead.

I’ll try NetBeans soon and check out the profiler.
One downside of -Xprof is also that you can’t see the exact method signatures. For example I have many BitMap.toBitMap methods with different parameters and it’s now impossible to see which one is which in -Xprof’s output.

But it also often doesn’t work because of bugs in the bytecode transformation engine :(.

Please file any bugs you see in this area with the issue tracker on the NetBeans web site.