JStella - Atari 2600 emulator in Java - performance issues

I manage the JStella project at SourceForge (at http://jstella.sourceforge.net). JStella is an Atari 2600 emulator written in Java. It is based on the open source Stella software, which is written in C++. I translated Stella into Java mainly to prove wrong the people who said that it would not work well in Java. And I think I have been largely successful in this…it currently runs just as well as the C++ on my computer. But I really don’t think it’s optimized…I’m not a Java2D expert, or a Java performance person, and I think there is a lot of stuff relating to graphics in the code that causes it to not be as fast as it should. (For example, I just found out about the createCompatibleImage method…that improved performance on my machine dramatically, but apparently not so much on other machines.)

I use clipping, so the main problems are when things on different parts of the screen get changed. There is some slowdown…it slows down the virtual CPU (of the Atari) as well, which seems to me like somehow the code in the “calculation” thread is blocking while waiting for the thread that does the painting to finish…is this normal? Of course, I may be completely wrong. But if any Java2D pros out there want to look at it, contribute, etc., I (et al.) would be grateful. (You can go to the JStella project page on Sourceforge and check out the CVS repository, where all the most recent source code is…the downloadable source code is a few weeks old.)
JLA

Hi,

I did some profiling, here are the results. I hope they will be useful for you - I don’t have the time to get everything up&running:

1.) Since the software your emulator runs seem to render directly to virtual “RAM” there is no way to get hardware accaleration.
You set the pixels of your backBuffer using:

 yBackBuffer.setRGB(x, y, zNewPaintedColor); 

This is very sub-optimal if a large quantity of pixels has to be set (as in your case).

As better way is to grab the pixel-array directly:

 byte[] data=((DataBufferByte)tex.getRaster().getDataBuffer()).getData()

You can do this once at backbuffer-initialization.
This usually detroys hw-accaleration, but because its not possible at all in your case don’t worry ^^

2.) A lot of time is lost in your paint-method when you paint the backBuffer to “real” hardware:

  public void paint(Graphics g) {
            //super.paint(g);
            Graphics2D z2D=(Graphics2D)g;
           if (myImage!=null) z2D.drawImage(myImage, myTransform, null);

Could you try to just draw the parts of the image you really need - using another drawImage-Method.
I am not sure about this, maybe Java2D misses some optimizations, maybe not.

Hope that helps, Good Luck!

lg Clemens

Thanks for looking at the code.

I’m not familiar with how Java does hardware/software rendering…as it is now, it checks to see if a pixel in a virtual buffer is different than the new one, and only if it is does it call setRGB. So this means that some frames will do this for every pixel, while others will only do it for 10 or so…would the DataBuffer technique be faster (or equal) in both cases? And should the emulator do hardware acceleration? Everything about the emulator is flexible…well, except for the caveat (my own preference really) that it be pure Java.

As far as the painting of the back buffer goes, it does use clipping/dirty-rect technique, but does so through the repaint command…
e.g.

repaint(rectangleThatNeedsToBeRepainted);

I assume this works as I think it does, because when there is a busy frame (with different parts of the frame needing repainting), it takes a lot longer to paint.

And while I’m at it, what do you know about Toolkit.sync()? Someone seemed to suggest it was necessary for Linux animations. I think the current version uses it, but I don’t know if it slows stuff down, is unnecessary, etc…

By the way, if you want to check out the performance of the JStella applet, there is currently an applet set up at http://www.ataritimes.com/jstella/index.php.

Thanks again
JLA

Hi again,

I would not waste too much time tinkering arround with this, simply use the DataBuffer approach (you’ll get a int[] where the colors are masked) and try it out.
Because you use clipping it should be faster in almost all cases :slight_smile:

Well this is quite harder to archive. “Hardware accaleration” in this context means that the emulator sends calls like drawLine() or fillRect() to java, java passes those commands to the OS, the OS to the driver and the driver to the card. However this has only benefits if the commands touch larger areas, for 1-20pixel its slower because of the additional overhead. I guess generating such call-sequences efficiently from your emulator isn’t possible at all easily.
However this shouldn’t be a real problem, as far as I understand rendering is done by the emulated application itself, so your emulator does only convert index’ed colors to rgb and paint the image.

Well good to know. What you see is the process of migrating the image from RAM to video-ram. You most likely won’t be able to get this away - but I am quite sure the databuffer-approach will do way better.

Well its does basically a sync of all buffered stuff. If it does not hurt let it be there.

Good luck, lg Clemens

Thanks again for your help.
As far as the getDataBuffer() stuff goes, what do I do if I create my BufferedImages via createCompatibleImage method in the toolkit? I really have no way of knowing what the data type of the image is going to be (byte, int)…although I’m pretty sure my system generates ints. Would I have to forgo the createCompatibleImage() method?

Thanks
JLA

There’s a good explanation of toolkit.sync() here: http://www.java-gaming.org/forums/index.php?topic=15000.0

Yes, use INT_RGB as data-type. Why don’t you simply try it and post the results?

lg Clemens

and, what experiences did you make? I am quite interested in your results :slight_smile:

I implemented something with getCompatibleImage thing that says if the type is an integer type with the equivalent RGB format, to use the method you suggested. For those that get a non-equivalent type of image, it uses the old way of doing it. I can’t see a change on my system, but I never had a problem before, so there isn’t much to see. But from profiling using the nanoTIme feature, it appeared to me that the drawing to the back buffer wasn’t necessarily the bottle neck, but instead it appeared that the bottle neck was the calculations part, but I think this was because it blocking due to the part that paints the back buffer to the screen…but I have no idea why it would block, because they are in separate threads. This bottleneck and delay cleared up on my system when I originally did the getCompatibleImage thing, but it apparently hasn’t on some other systems. But this strongly suggested to me that the problem is in the graphics/threads.

I haven’t been able to spend much time on it because of medical school, but if you (or whoever) want to become a developer on JStella, you would be more than welcome. And you could experiment around with it. There is a sort of fan-base for the software over at AtariAge.com, but they are more arcane assembly language than Java over there, so that’s why I came here.

JStella has had about 600 downloads over the past 8 or so weeks since its introduction. (This doesn’t include people who play the JStella applet on websites) . The parent program Stella (in C++) averages about that many downloads in a day. So a goal for anyone interested in promoting Java for desktop applications etc. (and helping prove Stevie Jobs wrong) would be for JStella to become competitive with its parent in popularity. Let me know if you or anyone here is interested in becoming a developer on this SourceForge project…
Thanks again
JLA

Whoops…hadn’t updated the CVS when I made that last post…now it is updated. I hope it doesn’t mess things up…I am assuming that TYPE_INT_RGB, TYPE_INT_ARGB, and TYPE_INT_ARGB_(forgot what was here) are all the same general format, in terms of what color byte is located where. If not, then some users may not be able to use the program…
JLA

Hi again,

I maybe know why you could experience blocking between the CPU and the Java2d-rendering thread.
As far as I understand you do the setRGB from one thread (is this also the thread the virtual CPU runs on?), whereas you render the image with drawImage on another thread.
Well as long as the image is rendered (which will take quite a lot of time, because it needs to be scaled and a vram upload has to be done), you can’t call setRGB of course because Java2D synchronizes on the image to avoid corruption.

1.) If its really the case that one thread may render the image you are actually painting to, I would create two images, one which is currently drawn to and one which is rendered.

2.) Please don’t overrate the compatibleImage stuff. You do operations with the image where it almost makes no difference wether your image has a compatible format of not - it is directed to the software rendering loops anyway. Can can read almost everywhere that setRGB is not recommended when setting many pixels (and because you are clipping anyway, it does not have an advantage for you if only a few pixels are changes).

3.) Don’t forget to synchronize when accessing the buffer of the image wher you are in another thread than the painting thread, like:
[source]
synchronized(image)
{
int[] raster = image.getRaster …
raster[i+x*y] = …
}
[/source]
All in all I did not see many synchronization-statements in your code, I don’t your code well so it could be that everything is fine but since you use more than one thread - have you thought about potential multi-threading problems?

lg Clemens

So you suggest to just manually use TYPE_INT_RGB? (This is what is returned as a “compatible image” with my setup.)

I used to have a lot of stuff synchronized, but I started eliminating them because I felt many of them were unneeded. Every once in a while I’ll get an exception thrown due to threads, but it isn’t that often…I know thread handling needs to be tightened, but I figure I need to find where the blocking is going on first.

I originally thought that the setRGB was blocking, but upon manual profiling, the delay seemed to be in a previous method call (in the same thread–the processFrame() method in JSTIA, called from JSConsole doFrame() method). Maybe I’m totally wrong about the thread thing, but when the screen is large (i.e. the stand-alone window maximized), the paint-to-the-screen method would often take around 54 milliseconds (very long), and the processFrame() part in a different thread that immediately followed (well, actually called concurrently) would take the same amount, which is WAY too long…and when I did the createCompatibleImage thing, neither of them would last that long and so the problem was fixed for MY setup. The processFrame() is just calculation stuff and shouldn’t be dependent on the size of the emulator window or the rendering methods. Or should it…?

Thanks
JLA

I just got word from one of the people who still had the slowdown, and he says that the newest version (0.8 - implementing the aforementioned changes) no longer slows down…that’s good news. Of course, there is still a lot of optimization that needs to be done, but I think one of the big hurdles has been jumped. Thanks for your help.
JLA

Hi again,

Good to hear, glad that my hints have helped :slight_smile:

Yes. Using an image format different to what getCompatibleImage returns will not make a difference for you, because your operations cannot be done in hardware anyway. I really would replace the setRGB/compatible image path and always create an INT_RGB and access the raster-data directly. Trust me :wink:

Well operating on not-synchronized , thread-shared data with multiple threads is really dangerous. On some systems it may work all the time, on some it may work with some exceptions and on some it won’t even run. The problem is that I can’t give you a small hint what to do and it will work as with the slowdowns - it really depends. Under some circumstances clever tricks are needed to do as little synchronization as possible - because it tends to be a quite expensive operation under some cirumstances.

To be honest I don’t have any spare time, but I’ll have a look to fix the most problematic parts.

lg Clemens

I went a bit through your code and it seems that its harder to fix the concurrency-problems than I thought.

If you’re interested in the topic I really can recommend the following article: http://www.ibm.com/developerworks/java/library/j-threads1.html
Its a series of 3 aticles about threading with Java, and although its quite old (2001) it explains all things that are important to know.
It mentions the dangers, how and when synchronization is slow and so on…

Just a few more suggestions:

1.) In the CPU-Thread, which starts the painting only access a int[]. This is a seperate copy of the image buffer, not the raster of the BufferedImage.
Don’t touch the BufferedImage itself in the CPU-Thread, better do this in the event-thread in paint()

Every time you access the local int[] synchronize it, but not in a loop, e.g.
synchronized(localBuffer)
{
for(int i=0; i < localbuffer.lenght …
}

In the event-thread do:
synchronized(localBuffer)
{
System.arraycopy(localBuffer, bufferedImageRasterBuffer, …
}

with the local int[] and the BufferedImage’s raster-buffer.
This way a currently painting BufferedImage can’t block your CPU thread and the BufferedImage itself is never acced outside the AWT thread.
Only copying from the localBuffer to the bufferedImage’s raster can block, but only short as this will be copied really fast.

2.) Really problematic parts are where two threads read&write, like the clipping stuff:
private void setClippingRectangle(int aClipX, int aClipY, int aClipWidth, int aClipHeight)
is called by the CPU thread, but the results are later read on the event-thread.

Both reads and writes to such variables need to be synchronized on the same object. (public synchronized methodName() synchronized on “this”).

3.) I don’t know how you interact when AWT-Events modify stuff on the CPU-Thread, but here its the same.

PS: I hope I have not demotivated you, your project is great!

Good luck, lg Clemens

I might slightly correct the synchronization blocks, because there are some tricks not to fall in “dead-lock”.
first, you may be blocked if both synchronized() {} blocks don’t notify() each other, just as a Thread fails in sleep mode while the lock is active and waits forever.
and second, the localBuffer variable might refer to a new reference, which is likely the case when buffering pictures data (buffer is cleared or replaced), and therefore the thread won’t lock to the correct reference if the variable changes the actual reference. Thereby a fixed monitor object reference must be used, that can be a single Monitor class you define with one member variable in it or even an usual Integer.

Well my comments were made with an eye to JStella’s source, so should not be seen as universal and perfekt :wink:

Maybe I miss here something in my threading knowledge… If both threads never use wait and also don’t use functions based on bait (like blocking IO, …) is there really a possibility that a deadlock may occur?

Well in the JStella case localBuffer was ment to be allocated in JSVideo and re-used all the time, to not stress the GC by allocating large arrays every frame.

lg Clemens

Well, it is much more complex to explain than to get it on one concrete example.
When I said “do not to miss notify() calls”, it is such a rule that one programmer sticks to as he doesn’t get it on the whole of its code… softly meant.
I put some on the shared code forum : http://www.java-gaming.org/forums/index.php?topic=17460.0 :smiley:

I read your post but I still don’t see a need for notify calls, if synchronized is used on a monitor which is never used for wait() but only to disallow concurrent access and guarantee data-accasibility.
And if another monitor blocks and keeps the thread Waiting, a call to notify on the wrong monitor also does nothing…

I guess I did not understand your idea :wink:

lg Clemens