Performance woes, 10x slowdown during mouse events

I have an application that can render static geometry at about 60 fps using only about 40% of the CPU. As soon as I start dragging a shape (process mouse events) the fps drops to about 4 and the cpu load jumps to almost 100%. I a lot of system time on the activity monitor. My measurements of draw time (using System.currentTimeInMillis) reflects this, but the Apple OpenGL profiler shows a steady 20% in OGL. So either my code is consuming a lot more time or JOGL is doing something to consume time. Since I am measuring wall clock time it is quite possible another thread is preempting my measurements of draw time. I have not been able to locate a good way to isoate where the CPU is going. Even the 100% overhead on steady state seems high, but the sudden change in overhead is what is really bothering me. I am using an Animator derivitive that limits fps to about 60. All calls to my GLEventListener are occuring on the AWT event thread, rather than the Animator thread which is set as the render thread.

Any pointers would help. A good per-thread CPU meter would also help locate the time.

Michael

Shark shows the following consuming a huge amout of the time. There “should” not be any textures created/destroyed or a change in which are displayed. The part of the scene that is being dragged is all computed geometry without texturing, and the scene under it is not moving or changing. But, this does give me another place to look that would not have occured to me. Any known issues in this area?

  53.6%      53.6%      com.apple.ATIRadeon8500      IOATIR200Shared::new_agp_texture(unsigned, unsigned long, unsigned*)      
  0.0%      53.6%      com.apple.ATIRadeon8500      IOATIR200Shared::new_texture(unsigned long, unsigned long, unsigned, unsigned long, unsigned*, unsigned*)      
  0.0%      53.6%      com.apple.ATIRadeon8500      IOATIR200GLContext::new_texture(sIOGLNewTextureData*, sIOGLNewTextureReturnData*, unsigned long, unsigned long*)      
  0.0%      53.6%      mach_kernel      is_io_connect_method_structureI_structureO      
  0.0%      53.6%      mach_kernel      iokit_server_routine      
  0.0%      53.6%      mach_kernel      ipc_kobject_server      
  0.0%      53.6%      mach_kernel      mach_msg_overwrite_trap      
  0.0%      53.6%      mach_kernel      mach_msg_trap      
  0.0%      53.6%      mach_kernel      shandler      
  21.3%      21.3%      mach_kernel      vm_map_lookup_entry      
  0.0%      20.3%      mach_kernel      vm_map_lookup_locked      
  0.0%      18.7%      mach_kernel      mach_make_memory_entry_64      
  0.0%      18.7%      mach_kernel      mach_make_memory_entry      
  0.0%      18.7%      mach_kernel      IOGeneralMemoryDescriptor::initWithOptions(void*, unsigned long, unsigned long, task*, unsigned long, IOMapper*)      
  0.0%      18.7%      mach_kernel      IOMemoryDescriptor::withAddress(unsigned, unsigned long, IODirection, task*)      
  0.0%      18.7%      com.apple.ATIRadeon8500      IOATIR200Shared::new_agp_texture(unsigned, unsigned long, unsigned*)      
  0.0%      18.7%      com.apple.ATIRadeon8500      IOATIR200Shared::new_texture(unsigned long, unsigned long, unsigned, unsigned long, unsigned*, unsigned*)

Do you see the same behavior running any of the JOGL demos that perform mouse interaction, like the Gears demo?

Is your GLEventListener.init() method getting called over and over again?

Are you sure you are performing all of your OpenGL work from your GLEventListener’s methods, or methods those methods call?

I have replaced the mouse input with programmed movement of the shape, and the problem is still there. The problem only happens if thre is a textured shape visible (since I do not draw shapes that are not visible). All drawing is using glDrawArrays or glMultiDrawArrays. All drawing parameters are set using VBOs (vertices, colors, and tex coordinates). All drawing uses 2D coordinates.

Using driver monitor all I see is the command bytes and context switches going up/down with the change from 60 fps down to 5/1 fps and back up again as the programmed behavior toggles between stable and moving. The available VRAM is around 45MB. This is on a Radeon 9000 Mobility (1Ghz Ti PowerBook).

OpenGL profiler shows no texture operations other than glBindTexture. I have yet to verify the texture IDs are cycling, but there should only be 3-4 in use for the test case.

I am going to start stripping out code to try and either locate what I am doing that is causing this, or to isolate the issue to a smaller set of code. If it is not in my code I will need to try to localize it down to JOGL or Apple OpenGL.

As far as I can tell the openGL calls are all on the one thread. I want to do a bit more to verify this.

I have verified that the texture IDs are cycling as expected, not thrashing. The textures are all rectangle textures scaled 1:1 to the pixels in the texture to the quads used for drawing. The scene is scaled to varying amounts.

The performance degredation appears to get worse with the number of textures being displayed. Only 1 texture and the degredation is pretty slow, but with 4 it gets bad fast. A smaller scene reduces the degredation in fps, but part of that is my geometry culling has less to wade through.

The only stat I am capturing that goes up is draw time and the number of VBOs I create for each frame. With dynamic behavior there may be for this example as many as 10-30 VBOs created for each frame. These are relatively small.

My next step is to verify I do not have any leaks in VBOs. If I am not deleting all VBOs I create, I presume this would cause degredation as it fills up AGP ram with VBOs.

In trying to isolate the issue I removed all textures from my scene. I am rendering no textured shapes, binding no textures, etc. And still getting samples like the following:

  66.1%      66.1%      com.apple.ATIRadeon8500      IOATIR200Shared::new_agp_texture(unsigned, unsigned long, unsigned*)      
  27.2%      27.2%      mach_kernel      vm_map_lookup_entry      
  0.0%      27.2%      mach_kernel      vm_map_lookup_locked      
  0.0%      21.1%      mach_kernel      mach_make_memory_entry_64      
  0.0%      21.1%      mach_kernel      mach_make_memory_entry      
  0.0%      21.1%      mach_kernel      IOGeneralMemoryDescriptor::initWithOptions(void*, unsigned long, unsigned long, task*, unsigned long, IOMapper*)      
  0.0%      21.1%      mach_kernel      IOMemoryDescriptor::withAddress(unsigned, unsigned long, IODirection, task*)      
  0.0%      21.1%      com.apple.ATIRadeon8500      IOATIR200Shared::new_agp_texture(unsigned, unsigned long, unsigned*)      
  0.0%      6.1%      mach_kernel      vm_fault

Can you get a stack trace for any of the samples that look like they’re creating a new texture?

Are you using Swing widgets in your component hierarchy?

Do any of the JOGL demos that use texturing exhibit this behavior?

How do I get Shark to show a stack trace? As I said in the newest post, I have eliminated the use of textures in the drawing that is being done. I am not using anything but a frame with one GLCanvas in it. I do have 6 sample FSAA enabled for the context.

Michael

I haven’t used Shark so can’t help you there. Does the HWShadowmapsSimple demo work on your machine? What about the VertexProgRefract demo? Does the same sort of slowdown appear with those demos? Have you tried turning off multisampling?

This is not my day job, so I will need to do testing this night. I will try turning off multi-samples. It may be doing that in software, which I guess would be using AGP textures, or that call could be named from old code that is now just general memory allocation. Most of the interesting demos require nVidia hardware, which this machine is not equiped with. I will see which ones I can get to run, and let them run for a while to see what happens. I suspect that it is either my code, or something I am doing that is stressing the driver in some way. Otherwise it would be obvious to others using the Mac (which seems to be pretty well represented for a change).

Michael

Removing the FSAA did not address the issue. In looking at the detailed samples from one run I found this sequence just before one of the supervisor level traces that had the problems. Do these look familar as part of JOGL or Apple’s OGL implementation? I believe the higher numbered entries are deeper in the call stack.

42 __MIG_check__Reply__io_connect_method_structureI_structureO_t
41 io_connect_method_structureI_structureO
40 gldDestroyBuffer
39 gldCompleteVertexBuffer
38 gldCompleteVertexBuffer
37 gldDestroyVertexArray
36 gldUpdateDispatch
35 dyld_stub_mach_host_self
34 dyld_stub_mach_host_self
33 Java_net_java_games_jogl_impl_macosx_MacOSXGLImpl_dispatch_1glMultiDrawArraysEXT__I_3I_3IIJ

I am still not sure what is really going on other than things slowly degrading during this sequence. The supervisor level samples correlate with the activity that causes the slow down, and consume the CPU.

It looks like it was generating a ton of GL garbage that it was not cleaning up. When the scene was stable the set of VBO buffers was stable, but with motion it was generating new buffers each frame and not releasing the old ones. Now I get 60 fps for the test case with only 75% of the CPU. Not bad for Java on a 1Ghz Powerbook. There is still a lot of optimizing to do, but I am on my way.