Texture Packing, is there any point in doing it manually?

Edit: Disclaimer - I just tried running the test on my year old gaming rig, and it performs at 280 fps. Even with 8000 items, it runs smooth. This is the kind of performance I like :slight_smile:
So I suppose there’s much blame to be passed to mobile graphics cards, or drivers, or vista, or what have you. But still, if you have any advice to give, shoot! I’d really like this to run well on any decent computer.

Hello,

I just spent much of my Sunday implementing a seamless (somewhat) texture-packing utility into my texture-manager and renderer,
thinking it could improve performance significantly since I’m doing 2D graphics, and draw thousands of small billboards while binding textures continuously.

The idea is of course to stuff many images into the same 1024x1024 texture and point to different coordinates at render.

But putting it to work, I don’t notice any difference in performance at all. Is this a legacy idea that binding is expensive? Or is it already taken care of by the JOGL API?

I’m using com.sun.opengl.util.texture.Texture by the way which does say “For best performance, try to avoid calling enable() / bind() / disable() any more than necessary.” in the API… perhaps it’s still insignificant compared to matrix-operating on RGBA images over and over. Who knows :slight_smile:

Slightly disappointing, but this is how it goes sometimes,

0gleth0rpe.

Here’s a screenshot and a description to give an idea of what I’m doing.
In the following shot I get ~50 fps with about 1400 vegetation sprites rendered.
The computer is a Precision M90, so it’s a fairly recent laptop.

My rendering method looks like this


public void drawPackedTexture(float x, float y, float w, float h, PackedTexture texture, 
			Alignment align, BlendMode mode, float[] rgb, TransformMatrix pMatrix) {
		y = height - y;
		float[][] c = getAlignmentOffset(align);
		Rectangle2D.Float bounds = texture.getBounds();

		bindTexture(texture.packedTextureId);
		setBlendMode(mode);
		
		gl.glPushMatrix();
		gl.glTranslatef(x, y, 0.0f);
		gl.glScalef(w, h, 1.0f);
		// gl.glRotatef(0, 0.0f, 0.0f, 1.0f);
		
		if (pMatrix != null) {
			gl.glMultMatrixf(pMatrix.getMatrix(), 0);
		}
		
		gl.glBegin(GL.GL_POLYGON);
		gl.glColor4f(rgb[0], rgb[1], rgb[2], rgb[3]);
		gl.glTexCoord2f(bounds.x, bounds.y);
		gl.glVertex2f(c[0][0], c[1][0]);
		gl.glTexCoord2f(bounds.x + bounds.width, bounds.y);
		gl.glVertex2f(c[0][1], c[1][0]);
		gl.glTexCoord2f(bounds.x + bounds.width, bounds.y + bounds.height);
		gl.glVertex2f(c[0][1], c[1][1]);
		gl.glTexCoord2f(bounds.x, bounds.y + bounds.height);
		gl.glVertex2f(c[0][0], c[1][1]);
		gl.glEnd();
		
		gl.glPopMatrix();
	}

Where bindtexture avoids binding already present textureIds. Transform matrix is used to shear the top of the grass, so that it moves in the wind.
Right now, it only needs to bind one texture for all the vegetation.

Anyway, that’s about what I can think of.

Thank you for reading.

http://dl-client.getdropbox.com/u/63422/Posted/packedtextures_091101.jpg

http://dl-client.getdropbox.com/u/63422/Posted/screenshot_090111.jpg

While I haven’t spent the time to benchmark it I’d guess it’s still a relatively relevant issue, as there’s an extension to help reduce the amount of binding needed, http://www.opengl.org/registry/specs/EXT/texture_array.txt

It was even promoted to core in 3.0

If you are having performance issues I would also look at trying to convert your immediate-mode rendering into vertex arrays or vertex buffers. Also I would get rid of that GL_POLYGON and make it a triangle strip instead.

Immediate mode is slower full-stop but right now you are putting down to opengl 32 floats for each polygon.

  • 6 for translate/scale
    +16 for the shear matrix.

So right now ignoring everything else you are sending down 54 floats per polygon, plus some additional overhead for pushing and popping matrices, setting polygon mode, etc…

Now just sending down floats isn’t necessarily the big deal but it is an indicator.

First off you can combine the translate/scale/shear into a single matrix so you only need 16 floats to specify the modelview matrix transform.

Secondly if you go back to using 1 texture per image, then you can make a single vertex buffer object that works for all your stuff:

uv=(0,1) uv=(1,1)
vt=(0,1) vt=(0,1)

uv=(0,0) uv=(1,0)
vt=(0,0) vt=(0,0)

Set it up like this:

Then render it like this:



Do this once per frame:

glBindBuffer(GL_ARRAY_BUFFER, floatBufferOfVerts);
glVertexPointer(2, GL_FLOAT, 0, 0);
glBindBuffer(GL_ARRAY_BUFFER, floatBufferOfUvs);
glTexCoordPointer(2, GL_FLOAT, 0, 0);
glActiveTexture(GL_TEXTURE0);
glClientActiveTexture(GL_TEXTURE0);
glEnableClientState(GL_TEXTURE_COORD_ARRAY);

for each of your visible textures, do this:

  glBindTexture(GL_TEXTURE_2D, textured)

  for each of the quads of that texture, do this:

    glLoadMatrix(...)
    glDrawArrays(GL_TRIANGLE_STRIP, 0, 4);
  .
.


That way you end up with a far more optimized render strategy. You only send down 16 floats per quad, + the glDrawArrays() command.

Furthermore, if your vegetables are all blowing the same way in the wind (the shear matrix is identical), you could get try loading that matrix at the beginning of the frame and then just translating/scaling against it for each quad.

This is all just pseudocode off the top of my head but I think it is a more performant way to render what you are doing. An important feature of what I wrote is that you have sorted all your renders by texture in advance so you only have as many texture binds as you have textures, no wasted effort there. Hope that helps.

At the very least try just converting your immediate-mode renders to vertex buffers and see if that helps.

JW