Ah… but I think the OpenGL pipeline thing is a red herring. I emailed Chris Campbell about that and he explained that the openGL pipeline simply generated less garbage due to the way it handed off more operations to OpenGL. My problem occurs with even with -Dsun.java2d.noddraw=true which should mean that only GDI calls are used.
Chris wrote: [quote]I think the reason you see more frequent/smaller GCs with the OGL pipeline is that it is able to perform many operations directly via OpenGL with few intermediate operations. For example, if your application does a lot of bilinear image scaling, the OGL pipeline can handle this directly, whereas the DDraw or GDI pipelines can only handle this using our software loops. Those software loops might require conversion from one image format to another, and therefore temporary intermediate images may be created frequently. More intermediate images means increased burden on GC.
Note that this is just one example of a situation where your app might create more garbage with one pipeline versus another. Have you used the -Dsun.java2d.trace flag?
http://java.sun.com/j2se/1.5.0/docs/guide/2d/flags.html#trace
I would suggest that you try running your app with this flag, first with OGL, then with the other pipelines, and then compare the logs. If you see lots of primitives with “General” in their name, then that’s a good indication that an operation is using intermediate surfaces, and possibly creating more garbage.
[/quote]
I think the sweep phase of the collection might still be blocking high-priority threads in an odd way, but simply for shorter times due to the characteristics of the garbage generated. And of course even GDI calls go through the graphics driver in some form eventually - but I’m afraid I don’t know enough about things at that level.
I also recently found this: http://java.sun.com/performance/reference/whitepapers/5.0_performance.html#2.10
which mentions:
[quote]The Concurrent Low Pause Garbage Collector, also known as CMS, has had several improvements for J2SE 5.0. Garbage collection pauses are shorter because of better use of parallelism during collection periods during which application threads are stopped.
[/quote]
This paper :http://java.sun.com/docs/hotspot/gc5.0/gc_tuning_5.html#0.0.0.%20The%20Concurrent%20Low%20Pause%20Collector|outline
was also quite helpful, but still leaves some unanswered questions. Such as when do the finalizers run? Is there any way to get profiling info about how much time is taken by the finalizer thread?
Here is an example of a collection that starves a priority 31 thread in the System process for 100ms or more:
1526.756: [GC 1526.756: [ParNew: 3968K->0K(4032K), 0.0159318 secs] 16703K->12797K(21260K), 0.0160274 secs]
1538.213: [GC 1538.214: [ParNew: 3968K->0K(4032K), 0.0105305 secs] 16765K->12860K(21260K), 0.0106204 secs]
1549.802: [GC 1549.802: [ParNew: 3968K->0K(4032K), 0.0151089 secs] 16828K->12923K(21260K), 0.0152276 secs]
1561.191: [GC 1561.191: [ParNew: 3968K->0K(4032K), 0.0156131 secs] 16891K->12984K(21260K), 0.0157176 secs]
1572.565: [GC 1572.565: [ParNew: 3968K->0K(4032K), 0.0154554 secs] 16952K->13046K(21260K), 0.0155757 secs]
1583.973: [GC 1583.973: [ParNew: 3968K->0K(4032K), 0.0122170 secs] 17014K->13107K(21260K), 0.0123115 secs]
1595.531: [GC 1595.531: [ParNew: 3968K->0K(4032K), 0.0162585 secs] 17075K->13168K(21260K), 0.0163687 secs]
1606.968: [GC 1606.968: [ParNew: 3968K->0K(4032K), 0.0158047 secs] 17136K->13229K(21260K), 0.0159176 secs]
1606.984: [GC [1 CMS-initial-mark: 13229K(17228K)] 13229K(21260K), 0.0008632 secs]
1606.985: [CMS-concurrent-mark-start]
1607.077: [CMS-concurrent-mark: 0.092/0.092 secs]
1607.077: [CMS-concurrent-preclean-start]
1607.077: [CMS-concurrent-preclean: 0.000/0.000 secs]
1607.092: [GC[YG occupancy: 86 K (4032 K)]1607.092: [Rescan (parallel) , 0.0009427 secs]1607.093: [weak refs processing, 0.0109421 secs] [1 CMS-remark: 13229K(17228K)] 13315K(21260K), 0.0161694 secs]
1607.123: [CMS-concurrent-sweep-start]
1607.308: [CMS-concurrent-sweep: 0.184/0.184 secs]
1607.308: [CMS-concurrent-reset-start]
1607.321: [CMS-concurrent-reset: 0.013/0.013 secs]
Encoding stopped on instance 0
1862.099: [GC 1862.099: [ParNew: 3968K->0K(4032K), 0.0171288 secs] 14714K->10755K(21944K), 0.0172192 secs]
2160.773: [GC 2160.773: [ParNew: 3968K->0K(4032K), 0.0169685 secs] 14723K->10756K(21944K), 0.0170589 secs]
The “encoding stopped” message if from my app realizing that we aborted a video capture because the capture hardware was forced to drop frames.
I’m not sure what the typical timeslice is on WinXP, but I would expect that in 184ms that a priority 31 thread in the system process would get a couple shots at the CPU. It doesn’t need to do much to setup a DMA operation to empty the FIFO of our hardware so it is quite surprising that the hardware FIFO overflowed. The FIFO can buffer 100ms of video frames