Performance and best practices

Hi forum!

I’ve recently started using Jogl for a 2D graphics engine I’m writing on. It was using Java2D previously, but I switched to Jogl because of performance issues; especially related to the drawing of alpha-translucent images. Luckily, my alpha-related performance problems disappeared with my switch to Jogl, but I’m still not at all seeing the performance I’m hoping for (and honestly expecting from a somewhat modern graphics card).

I suspect this is just because I’m new to OpenGL in general and just don’t know the best practices regarding the time cost of various operations. I’m turning to you here on the forum in the hope of getting some best practices or web links thereto shared with me. I’m guessing it would be helpful if I describe my drawing loop in at least some little detail. The main time consumer is the loop that draws an isometric tile-based map to the lowest layer of the screen. The screen is 800x600 pixels, and I’m drawing somewhere around 20x60 tiles per frame, the tiles being 46x23 pixels (overlapping by one half tile in the vertical direction for the isometricity). Basically, I’m looping from the top to the bottom of the screen (to make sure that the tiles overlap properly) drawing textured quads. There are some screen shots here if you want to see what kind of graphics it results in.

Some performance numbers: When I’m drawing this map to the screen, it takes around 15 ms, so it alone can be drawn continuously at around 50-60 FPS. Any way I’m doing the math on that, it should add up to 20 * 60 * 46 * 23 * 50 = 63 MPixel/sec. Sources on the web indicate that my graphics card (a Mobility Radeon X300) should have a fill rate of more than one GPixel/sec, so I should be operating more than an order of magnitude away from hitting that limit. Likewise, it seems to involve drawing around 60000 quads per second, which should be virtually nothing, as far as I can tell. By the way; the performance seems to be about the same on both Windows XP and Linux with X.org. Using ATI’s proprietary drivers on Linux raised the speed by a few frames per second, but nothing more fundamental than that. I’ve profiled the loops in several ways, and it seems very much that all significant time is being spent in various OpenGL native routines, rather than my own program logic.

So, does anyone have any idea what I’m missing here? I suspect it’s some really simple n00b problem, but have not been able to find anything about it even after Googling around a lot. I’d be really grateful for any helpful replies.

If you’re really simply drawing left-to-right, top-to-bottom, you must be virtually binding a new texture every quad. That will really slow you down. Try doing/googling state-sorting.

Non-power-of-two sized textures tend to be much slower to draw than power-of-two sized ones, so if each tile is in it’s own unique texture at 46x23 then you’re really not doing yourself any favours.

You’ll want to pack multiple sprites into a single sprite sheet / texture atlas, which can be a nice round 512x512 (or similar). That’ll also help with your state sorting as you’ll have to switch texture less often.

And if you find you’re getting a constant 60fps, it could well be that you’ve got vsync on (and therefore won’t be going any faster than the refresh rate, regardless of how fast your drawing is). For benchmarking always turn vsync off.

Thank you for that tip! I’ve been doing some research on the subject now, but there is one minor and one major thing that are unclear to me:

First of all: Since I’m drawing 2D stuff, I’m not sure how to handle overdraw when drawing quads outside their normal drawing order. However, I am guessing that it should be possible to explicitly set the “drawing order” using the third coordinate element to glVertex and enable depth testing. Since I’m using orthogonal drawing, the distance from the camera shouldn’t matter otherwise, right?

However: Even so, I am unclear about what happens to alpha transparent pixels when I use depth testing. To take an example: What if I have drawn one pixel with alpha 100% at depth 20, then another pixel with alpha 50% at depth 10 on the same screen pixel, and then a third pixel on the same screen pixel with alpha 50% at depth 15? What will the final output be? I imagine that this should be covered in the OpenGL spec somewhere, but I have been unable to find it.

That is actually not the problem, because I always allocate textures with the next largest power of two for its size.

[quote=“Orangy Tang,post:3,topic:31687”]
That’s a great idea, though! I will be sure to implement at least a slightly simplified version of it.