Debugging bad flashing with manual context usage

mithrandir · May 9, 2006, 5:09pm

In heavy content worlds (ie 5000+ sets of triangle arrays) we’re getting some pretty mad flashing when using manual context management on several different types of video cards and system setups. This suggests we’re doing something wrong with the way JOGL wants to interact with AWT. All the demos around use the listener method, which is not what we’re doing. Wondering if anyone has some ideas about what to tweak as the setup?

Our basic rendering loop has its own thread that looks something along the lines of the following psuedo code:


public void run() {
  while(true) {
     glContext.makeCurrent();
     GL gl = glContext.getGL();
     gl.drawthings....
     glContext.release();
     ... some time later
    glDrawable.swapBuffers();
  }
}

The GLCanvas is setup as follows:


       canvas = new GLCanvas(caps, chooser, shared_context, null);
        ((GLCanvas)canvas).setAutoSwapBufferMode(false);
        ((Component)canvas).setIgnoreRepaint(true);

        canvasContext = ((GLAutoDrawable)canvas).getContext();
        canvasContext.setSynchronized(true);

What this results in is a flash of what appears to be the screen blanking to black followed by the drawing of the scene. It seems like something else is also messing with the context and doing its own clear somewhere else. The strange thing is that if we manually limit the frame rate to about 30fps, we don’t see any flash at all - it’s only once we let it start to run up near 50 FPS that the flashing starts occuring. We’re not making use of the Threading class for scheduling on the “openGL thread” because we are the open gl thread, so we control precisely when all the drawing is happening.

Anyone have some ideas on what to do about this? (This code is using JOGL JSR beta 4)

Edit:

I should add that the flashing only occurs if there is some amount of time between the release() and swapBuffers() calls. If I put them one after the other, there is no flashing. For example:


     glContext.release();
     glDrawable.swapBuffers();

and


     glDrawable.swapBuffers();
     glContext.release();

Both result in no flashing. It’s only when there’s a few milliseconds of time between the two, (eg processing application logic before the swap) that the flashing occurs. Everything is guaranteed to be on the same rendering thread.

kbr · May 9, 2006, 9:45pm

The flashing I’ve seen in the past was usually due to doing the OpenGL work on one thread and the swapBuffers call on another thread. If this definitely isn’t going on in your application I’m not sure what the root cause could be. You may want to see if specifying -Dsun.awt.noerasebackground=true changes the behavior.

Markus_Persson · May 10, 2006, 7:47am

Why do you release the context just to make it current after the swap?

In my code, I just have a single makeCurrent() call (before the while(true) in your pseudo code), and I get no weird flashing, and it works pretty much everywhere.
If you’re really doing everything in the same thread, there should be no need to ever release the context until the application stops.

kbr · May 10, 2006, 3:38pm

On some X11 setups it’s required to release the context periodically because otherwise the AWT lock will be held continually and mouse and keyboard events won’t be delivered.

That having been said, Markus_Persson has a good point; for best performance, at least, you should continue to hold the context during the swapBuffers operation.

mithrandir · May 10, 2006, 4:39pm

On nVidia cards, on win32, it locks the application solid if you keep the context current. You need to release it. Basically the same sort of behaviour as Ken is describing for X11 (AWT event thread becomes locked, resulting in no rendering at all). In any event, we’re building a Java application, not a Windows or X11 application. We write one set of code that runs on everything without needing to do platform-specific workarounds.

Also, our pipeline and rendering is setup to deal with multithreaded rendering. We keep the swap call separate so that we can sync the various different rendering pipeline threads to all swap at the same time. With the lack of any genlock API in JOGL, that’s the best we can do to get everything in sync.

mithrandir · May 10, 2006, 8:55pm

Just did another check through the code. We’re definitely hitting both calls from the same thread class instance and ID. Something else is up internally with the way JOGL is playing with the contexts I think. Time to dig into that area. A delay of a millisecond or two should not be causing what we’re seeing on screen. ???

mithrandir · May 10, 2006, 9:30pm

Ah ha! Found the problem with JOGL.


GLCanvas canvas = new GLCanvas()
GLContext ctx = canvas.createContext(null);

if(ctx.getDrawable() == canvas)
  System.out.println("failed");

IOW, the drawable that you created the context from is not the drawable that the GLContext returns. How crazy and completely unintuitive is that! The upshot of this is that somewhere in the bowels of JOGL, the canvas.swapBuffers() call is being shuffled off into a different thread from the one that was being called from. Stated another way, JOGL is breaking its own contract about the drawing threads. It’s deliberately shuffling the swap code into a separate thread, despite the end-users wishes. If I change my canvas.swapBuffers() call to glContext.getGLDrawable().swapBuffers() there is no flashing anymore. ARRGH! If you’re going to label a class as extending an interface, it should do everything it can to observe that interface’s contract to the end user. That’s not happening right now.

kbr · May 10, 2006, 11:45pm

We obey the contract of the interface while also providing thread safety. We also provide the GLDrawable and GLContext abstractions so people can build their own OpenGL widgets without using the GLCanvas. If you don’t like the behavior of the GLCanvas, don’t use it; you have all of the tools available to write your own.

mithrandir · May 11, 2006, 6:13am

Well it’s obviously not providing thread safety as it’s changing the thread I’m calling it from! If I’m doing the right thing by the contract (ie keeping and calling everything from the one thread all the time), I would expect the callee to also obey it’s end of the bargain. That is not happening and is a very bad implementation. Why should the user not expect to get the same drawable back that they created the context from? Why should they not expect the implementation to honour the fact that everything is being called from the one thread and just swap on that thread without creating a new thread that the user never asked for. If the user lives up to their end of the bargain, so should the implementation.

Here’s the wording from the Javadoc for createContext():

"Creates a new context for drawing to this drawable that will optionally share display lists and other server-side OpenGL objects with the specified GLContext. "

Then over on GLContext.getGLDrawable():

“Returns the GLDrawable to which this context may be used to draw.”

Obviously the RI code is not following that contract because it is treating the context and its drawable as something completely different to the drawable that it was created from - as evidenced by the lack of reference equality between the two (or even a .equals() would be good enough). They are not functionally equivalent because the code is treating them as two different objects on two different threads when they should not be.

As for fixing it, I have - The SWT code obeys the contracts and performs as expected by the javadoc. The drawable you created the context from is the drawable you get back when you ask the context for it’s drawable. Nothing gets shuffled off to a thread that the user never created or asked for, nor does it do anything completely unexpected like the RI code does.

kbr · May 11, 2006, 6:21am

The existence of the AWT event queue thread implies that current applications using the AWT and Swing will have to deal with a certain amount of multithreading-related issues. Thus the necessity of EventQueue.invokeLater(), EventQueue.invokeAndWait(), and the SwingUtilities class. The SWT does not use an internal multithreaded model which means that things can be simpler. Personally I would like to see the AWT migrate to a single-threaded model, but the current reality on all platforms is that it is multithreaded.

When dealing with the AWT and OpenGL it is an absolute requirement to do some amount of serialization of the OpenGL work in order to achieve robustness on a range of platforms and graphics cards. Thus the internal redispatching of work in e.g. the GLCanvas class. The Threading class is explicitly provided in JSR-231 to allow the end user a way to interact with this redispatching if so desired. Your application ignored the Threading class and instead performed GLContext-related work on whatever thread happened to be current. This is why your application was not functioning correctly, not because the RI is a “very bad implementation”, in my opinion.

Markus_Persson · May 11, 2006, 8:13am

[quote=“Ken Russell,post:4,topic:27258”]
Aaargh! Could you please try to fix that? There should be no need to synchronize awt with opengl, except if you want to use the (rather silly, if you ask me) animatedcanvas class.
In a singlethreaded application, the context should be bound once, and only once.

Having the RI deadlock on just some platforms when not releasing the context is NOT thread safety. Not even remotely close.

mithrandir · May 11, 2006, 2:09pm

SWT has an event queue concept as well. Have a look at the Display class and syncExec()/asyncExec() call. These are functionally equivalent of the AWT EventQueue methods. Nowhere in the SWT code have I needed to make any calls onto that thread. Just like writing a C application with OpenGL, I never have to sync or shift anything onto the underlying platform-specific window API’s event thread. I can’t see why the RI code thinks it must follow some other set of rules that are just not used or followed in any other system.

This just does not make sense at all. If I call glContext.getGLDrawable().swapBuffers() and then immediately swap the code over to use glCanvas.swapBuffers() using exactly the same timing (ie substitute one line of code for another), then I get two entirely different sets of visual behaviour. One call throws things off to a different thread, another doesn’t. That is completely opposite to what you are claiming should be correct behaviour. Either the basic assumptions you are making are wrong (the need for the Threading class at all) or the implementation is wrong (two different behaviours based on which drawable object instance is used).

The other incorrect behaviour assumption that is being made by the RI code is which thread is the “OpenGL thread”: If I did the makeCurrent() on a thread which happens to be mine, then that is the current “OpenGL thread”. Why is one call saying that it is (glContext.getGLDrawable()) and another call saying that it isn’t (GLCanvas)? I’m not debating which behaviour is correct, but that the behaviour is inconsistent between two different code paths that should, by the javadoc definition, behave identically. The fact that they are not behaving the same, yet one is working as expected and one is not, is the problem here.

kbr · May 11, 2006, 11:10pm

There is no way to fix this in the general case. There are currently two reasons why this pattern wouldn’t work on X11 platforms. The first is if you’re using Mesa or otherwise have an indirect rendering context, in which GL calls turn into GLX tokens. It is an absolute requirement that the AWT lock be held while the OpenGL context is current in this case to prevent Xlib sync errors. The second is if you’re using ATI’s proprietary drivers, where apparently even if you have a direct rendering context some GL calls cause GLX tokens to be sent over the wire. JOGL detects this and falls back to using the heavier weight synchronization. If you only care about your app working with direct rendering contexts then complain to ATI and get them to fix their drivers. Personally I don’t see the big deal with releasing the context periodically.

Markus_Persson · May 12, 2006, 8:47am

It just makes NO SENSE AT ALL that you have to lock the the thread delivering input events to be able to render to the screen.
Unless C++ apps are also forced to release the context frequently, I just don’t buy your explanations.

You might not see the big deal with releasing context, but that doesn’t mean nobody else does.

kbr · May 12, 2006, 4:42pm

If the C++ app interacts with the X server in a multithreaded fashion like the AWT does then it would have to perform the same sort of locking. The issue is that the AWT communicates with the X server from multiple threads, synchronizing its communication with the global AWT lock, and in order to work at all JOGL has to follow the same synchronization pattern.

Markus_Persson · May 13, 2006, 8:29am

Right. But SWT doesn’t have this limitation?

I guess I should look into changing wurm to SWT+JOGL, or possibly LWJGL, then.

kbr · May 13, 2006, 1:00pm

I’m not sure. I think it’s possible to run multiple event loops in the SWT which would imply it needs to do the same kind of locking. When using JOGL with the SWT via GLDrawableFactory.createExternalGLContext() JOGL does no additional locking. I have no idea what the Aviatrix3D SWT / JSR-231 support looks like.

mithrandir · May 13, 2006, 1:31pm

Well it started life as the JOGL RI modified lightly to take SWT components. However, I found that because of the tight coupling to the AWT Event queue in all sorts of completely unexpected places that it was creating havoc with the rendering. A lot of this was because of all the Java2D references internally, which then lead to the Threading class and all it’s access to the AWT event queue. In particular the mac port has a lot of problems with that. For example, the pbuffer code was synchronizing itself to the AWT event loop (eh?). I had strip out most of the guts of GLContextImpl and GLPbufferImpl just so that it wouldn’t lock up on OSX. The result was I had to strip the code back to a very simple implementation that did no locking anywhere, nor used any threading. There’s still some amounts of the RI code there, but most of it has been greatly reduced in complexity. At least on all the platforms we’ve tried it on, which is quite extensive, there hasn’t been any rendering issues at all.

The only thing left preventing a new release is waiting on some gluegen changes I sent to Ken a couple of weeks ago. Need these to make it back to the core gluegen lib before I want to push out the next version. These changes allow me to generate full AGL bindings (needed because AWT works on Cocoa, but SWT works on Carbon, so I need to generate different API bindings)