Windows JVM scheduling bug?

I’ve managed to narrow down a problem to the following:

Whilst the AWT event thread is running, it often gets scheduled exclusively - ie. other threads get ZERO running time, for periods of up to 2 seconds. This is on a 1.5Ghz machine; on machines less than 1Ghz, it can get exclusive for tens of seconds.

Unless I’m mistaken, this is not a feature, it’s a bug. I’m testing mainly on Windows XP, and since when did XP do 10 second long thread starvation? My other thread(s) (I can reproduce with either multiple specialized threads, or with all functions located in one thread) do everything from cpu-intensive calculations through to painting on Graphics. There is no reason why they should get starved for tens of seconds.

Note: I am doing almost NOTHING in the EventThread, during actionPerformed calls. On linux, I spend less than 30 millis in my code over the course of 30 seconds. On Windows XP, I spend over 1000 millis in my code during actionPerformed. The only method calls I’m making are constructors for Point objects.

Even if removing those constructors reduces the time spent in the AWT event thread, this surely does NOT justify thread starvation?

I have asked Sun about why they have a tutorial webpage that is completely wrong (the tutorial on threading, specifically the page talking about scheduling, which is directly conflicting with some of the things stated in the 1.4.x release notes, and perhaps others too), AND how they are actually implementing thread-scheduling under windows (is it native, or simulated?), but have had no response in 2 months. !?!?

I’m asking here in case I’ve done something stupid, or possibly made a silly mistake. Otherwise, this does appear to be a major bug in the windows JVM. Or do you disagree? What am I misunderstanding here?

Thread scheduling’s pure native as the driven snow under all Windows platforms. So anything freaky going on is an AWT implementation quirk.

Got a profile of it? Just -Xprof will do.

Cas :slight_smile:

I haven’t seen anything like that with my own multithreaded Java apps on XP.

Sounds like a sychronization problem. Perhaps the AWT thread is getting stuck on something, or entering a synchronized block that all other threads need access too (new’ing those points maybe?).

I guess try and reduce it to a test case and maybe something else will appear. Or try and run thru jdb and see if you can work out what happens before/after actionPerformed is called.

Also maybe try putting the non-AWT threads on low priority. Perversely I found this often helped with jitters and so on, as the AWT thread was hardly doing anything so normally doesn’t mess with the other threads.

Thanks for the suggestions; I’m willing to try all of them :). Currently I’m majoring on reducing as much code as possible to make a lean test case, although since it appears CPU-speed / spare processing power / etc dependent, I’m not sure how lean I’ll be able to go without ruining everything.

Cas: yes, that’s what I thought from reading 1.4.x notes (and indeed is what I remember from 1.2.x days) - but, as I noted, the java tutorial boldly states this isn’t true. Shrug.

swpalmer: I’d never seen it before either, and I’ve been doing multi-threaded java since 1.1.x on winNT. Interestingly, I’ve always used pure AWT (usually had to for speed reasons, and to avoid “known bugs” in swing) until fairly recently, which might be significant.

Hmm. Actually, I DO have a similar problem with the AWT thread in an old tiny program, but it only occurs one time in 20. I got a test case to sun, and a bug report on 1.4.1 which they accepted, because it seemed the AWT thread was initiallising and displaying the window, but 1 time in 20 it wasn’t creating the windowing-system resources, so getWidth etc returned -1, permanently. I had got as far as showing it was a race condition somewhere in the AWT startup code. I think I demonstrated it on both windows and linux, hence making it appear a fundamental design problem in some of the higher-level code, but I can’t quite remember.

I’m also going to replace my (nice, easy, well-suited to a game) timing system with a Swing timer, so that there is only the AWT thread, and see what happens. I doubt it will tell us much, but worth a try.

[quote]Sounds like a sychronization problem. Perhaps the AWT thread is getting stuck on something, or entering a synchronized block that all other threads need access too (new’ing those points maybe?).
[/quote]
It’s crazy, but it might just work. That’s a very interesting thought. Whilst it would seem to me insane that Point manipulations could block the AWT thread - and ONLY on windows systems - my observed behaviour does tally with an “unexpected” synchronize bottleneck. But perhaps there’s synchronization elsewhere, in some otherwise innocent-looking code…

I’m going to do a careful check to see if there might be some synchronization taking place I hadn’t noticed.

[quote]Thread scheduling’s pure native as the driven snow under all Windows platforms. So anything freaky going on is an AWT implementation quirk.

Got a profile of it? Just -Xprof will do.
[/quote]
Yeah. On linux, 1Ghz machine, about 40% of time is spent in java.awt.EventDispatchThread.run(), about 40% in JComponent repaint’s, and the rest spread about in lots of places.

On windows XP, 1.5 ghz machine, about 80% of time is spent in sun.awt.windows.WToolkit.eventLoop, but the JComponent paint’s don’t show up at all. Don’t ask me why - same VM version, same profiling commandline (-Xrunhprof:cpu=samples,thread=y,depth=20) ?!?

However, I just noticed that I’ve got my image-preloading turned off, and so I’ve got ImageFetcher threads all over the place, with load taking place on first use. So i’ll just go fix that… :slight_smile:

Well, I’ve found the offending line:


    repaint();

My game-loop triggers repaints as necessary, in preparation for moving to a full-screen implementation later. However, there was this extra repaint which had slipped in to the mouseMoved() method. Removing it has had two effects: firstly, there is no apparent mis-scheduling any more. Secondly, windows-JVM behaviour (i.e. user-perceived performance, delays, etc) is now identical to linux.

However, I still don’t understand WHY this happened. I can guess that the reason for the system-dependent behaviour was that linux was choosing always to collapse that particular repaint(), and hence (effectively) ignore it.

My only guess for what was happening is that having a repaint() inside the EventThread on windows was either causing repaint()'s from other threads to block, or triggers some special behaviour causing the EventThread to rise in priority (?).

Note: the behaviour witnessed was that the game-loop thread actually froze; no processing took place in that loop at all whilst it was frozen; when it resumed, it carried on animating etc from the precise point where it left off. Hence, it is not simply that repaints from this other (non EventThread) thread were being ignored.

If my guesses are even slightly accurate, could this be a bug specific to “EventThread + repaint()-inside-a-MouseEvent-dispatch”? My main reason for thinking “bug, not feature” is the 100% repeatable platform-dependency here. And the fact that ommitting that one line causes the two platforms to behave identically.

Of course, if there’s some documented reason for this behaviour in the API’s, which I’ve failed to notice, feel free to slap me :).

Perhaps it’s something to do with accessing the event queue whilst running on the event thread. Although that is pretty bad really.

Perhaps when dispatching on the AWT thread it actually does the event there and then, before waiting for more events. e.g.

click on button
handle click
despatch paint event
finish handling click
handle paint event
wait for more events.

? Only guessing but the java.awt.EventQueue javadoc does say it’s implementation dependant.

[quote]Perhaps it’s something to do with accessing the event queue whilst running on the event thread. Although that is pretty bad really.
[/quote]
Hmm. Yes. Any ideas on generating a SMALL test case? They won’t let me log a bug otherwise… :(. Thoughts on loading the system enough to make the bug show up?

Famous last words for any java standard library…sigh.