Basic SpriteBatcher With LWJGL

So after spend countless hours on the inter webs and pushing my googlfu skillz to their limit I got a simpe but fast SriteBatcher working. I am going to explain the process of making a SpriteBatcher and then give you the code if you would just like to modify it for your own needs.

Note: I hate it when mathematics books give you the simplest example problem and then tell you do to all sorts of tricky problems. So I will be showing you something a little more complicated than need be so you will know what to do if you want to change something.

Lets get started

What you need: Computer, EDI/Notepad, have gotten something more then glBegin/glEnd working, have at least done some basic things with Vertex Arrays/VBO in LWJGL before.
Tip: http://www.java-gaming.org/topics/introduction-to-vertex-arrays-and-vertex-buffer-objects-opengl/24272/view.html
A big plus is knowing what a Texture Atlas is.

We need to understand what a SpriteBatcher is. A huge slow down in programing shtuffs in openGL is draw calls. By lowering draw calls you reduce the load on the CPU. By batching as many sprites as we can into one draw call, we reduce the CPU load thus improving performance. We can do this with Vertex Arrays. Why not VBO? Because sprites are very dynamic little buggers and can change possible every frame. (and then some) VBOs can be faster when the data is not so dynamic which is not the case for a spritebatcher. Now that I have blabbered on for w while lets look at some actual code.

Quick Note: This is my batcher and I use a class called TexRegion which is what it sounds like, a texture region. This is to show you how to set it up for working with Texture Atlases.


public class SpriteBatcher {
	private static float[] empty = new float[8];
	private static Vector4f empty1 = new Vector4f(0,0,0,0);
	
	private float[] vertArray;
	private byte[] colorArray;
	private float[] texArray;
	private int draws;
	private int maxDraws = 1000;
	private int vertIndex;
	private int colIndex;
	private int texIndex;
	private int currentTex;
	private FloatBuffer vertBuff, texBuff;
	private ByteBuffer colBuff;
	
	static{
		empty[0] = 0;
		empty[1] = 0;
		empty[2] = 1;
		empty[3] = 0;
		empty[4] = 1;
		empty[5] = 1;
		empty[6] = 0;
		empty[7] = 1;
	}

So what is all this jazz? The two static fields are for when you may want to have the sprite batcher draw something with out specifying a color or texture region. It is better to not create these things every time we need them via new Vector4f or new float[].

We have 3 float arrays, one for vertex coords, one for tex coords, and one for color coords. You can guess that the three ints are what we will be using to keep track of where we are in filling up the batcher. Then we also have an int to keep track of what texture we are working with.

We have 2 float buffers for vertex and texture coords and one byte buffer for color. Why a byte buffer? Since we want to be able to do sprites that can change transparency every frame we will need RGBA. If we used floats this would be 444 bytes. Bye reducing the bytes we send to the gpu, we can increase performance slightly. If you would like the more actuate float, simply drop the byte buffer and add another float buffer.

Last we have max draw calls and current draw calls. Why do we have these? There is an optimal size for VBOs and vertex arrays. That is to say, you want to give things to the GPU in byte sized chunks. The most optimal I have found for this batcher is between 1000-1500 sprites at a given time. So lets make a constructor for this class.

public SpriteBatcher()
	{
		this(1000);
	}
	
	public SpriteBatcher(int size)
	{
		vertArray = new float[size*2*4];
		vertBuff = BufferUtils.createFloatBuffer(vertArray.length);
		colorArray = new byte[size*4*4];
		colBuff = BufferUtils.createByteBuffer(colorArray.length);
		texArray = new float[size*2*4];
		texBuff = BufferUtils.createFloatBuffer(texArray.length);
		vertIndex = 0;
		colIndex = 0;
		texIndex = 0;
		maxDraws = size;
		draws = 0; 
	}

The default constructor calls sets the size to 1000 but we also will let people choose what they want the size to be.
Most things here are straight forward. vertArray needs to have the size * 2 (vertices at each corner) * 4 (number of corners). The vertBuff will have the vertArrays length. You could also put size24. The same goes for the other arrays. Only thing to note is that the byte array will have a multiplier of 4 because we are using RGBA. Set all indexes to 0, draws to 0, and maxDraws to size.

Lets keep thing in openGL style and create two methods that will be used to start and end rendering with the batcher, begin() and end().


public void begin()
	{
		glEnableClientState(GL11.GL_VERTEX_ARRAY);
		glEnableClientState(GL11.GL_TEXTURE_COORD_ARRAY);
		glEnableClientState(GL11.GL_COLOR_ARRAY);
	}
	
	public void end()
	{
		render();
	        
		glDisableClientState(GL11.GL_VERTEX_ARRAY);
		glDisableClientState(GL11.GL_TEXTURE_COORD_ARRAY);
		glDisableClientState(GL11.GL_COLOR_ARRAY);
	}

Very simple. Enable the client states and then render() and disable client states in the end(). Now lets look at the render().


private void render()
	{
		glBindTexture(GL11.GL_TEXTURE_2D, currentTex);
		vertBuff.put(vertArray);
		vertBuff.flip();
		colBuff.put(colorArray);
		colBuff.flip();
		texBuff.put(texArray);
		texBuff.flip();
      glVertexPointer(2, 0, vertBuff);
		glColorPointer(4,true, 0, colBuff);
      glTexCoordPointer(2, 0, texBuff);
      glDrawArrays(GL_QUADS, 0, draws*4);
      vertBuff.clear();
      colBuff.clear();
      texBuff.clear();
      vertIndex = 0;
		colIndex = 0;
		texIndex = 0;
		draws = 0; 
	}

Still very simple. Bind what ever texture is being used. Fill the buffers. Flip the buffers. (never forget that) Specify the pointers. Note the color pointer. We are using bytes and saying that they are unsigned. Then we draw using draws*4 because there are 4 indices for each sprites. Why are we not using the whole indices buffer trick and drawElements or drawRangeElements? Due too sprites dynamic nature, they will rarely share triangles so you will lose 1-2 fps by adding in an indices buffer. If you do not know what I mean when I say indices buffer, do not fret! Use the googlefu! or just ignore it and continue on.

We will finally clear the buffers, set the indexes back to 0, and set the draws to 0. Woh! really simple! well…no comes the complex part…actually filling the arrays with useful information such as where are sprites is, what size it is, what texture it is using, what color is it if any at all, and yes…if it is rotated at all.

So here is the scariest looking method in the whole class. draw(blah blah blah sprite stuff)


public void draw(int texID, float[] region, float x, float y, float width, float height, float rotation, Vector4f col )
	{
		if(texID != currentTex)
		{
			render();
			currentTex = texID; 
		}
		if(draws == maxDraws)
		{
			render();
		}

		final float p1x = -width/2;
		final float p1y = -height/2;
		final float p2x = width/2;
		final float p2y = -height/2;
		final float p3x = width/2;
		final float p3y = height/2;
		final float p4x = -width/2;
		final float p4y = height/2;

		float x1;
		float y1;
		float x2;
		float y2;
		float x3;
		float y3;
		float x4;
		float y4;

		// rotate
		if (rotation != 0) {
		final float cos = (float) FastMath.cosDeg(rotation);
		final float sin = (float) FastMath.sinDeg(rotation);

		x1 = cos * p1x - sin * p1y;
		y1 = sin * p1x + cos * p1y;

		x2 = cos * p2x - sin * p2y;
		y2 = sin * p2x + cos * p2y;

		x3 = cos * p3x - sin * p3y;
		y3 = sin * p3x + cos * p3y;

		x4 = cos * p4x - sin * p4y;
		y4 = sin * p4x + cos * p4y;
		} else {
		x1 = p1x;
		y1 = p1y;

		x2 = p2x;
		y2 = p2y;

		x3 = p3x;
		y3 = p3y;

		x4 = p4x;
		y4 = p4y;
		}
		x1+=x;
		x2+=x;
		x3+=x;
		x4+=x;
		y1+=y;
		y2+=y;
		y3+=y;
		y4+=y;
		
		vertArray[vertIndex] 	= x1;
		texArray[texIndex] 		= region[0];
		vertArray[vertIndex+1] 	= y1;
		texArray[texIndex+1] 	= region[1];
		
		vertArray[vertIndex+2] 	= x2;
		texArray[texIndex+2] 	= region[2];
		vertArray[vertIndex+3] 	= y2;
		texArray[texIndex+3] 	= region[3];
		
		vertArray[vertIndex+4] 	= x3;
		texArray[texIndex+4] 	= region[4];
		vertArray[vertIndex+5] 	= y3;
		texArray[texIndex+5] 	= region[5];
		
		vertArray[vertIndex+6] 	= x4;
		texArray[texIndex+6] 	= region[6];
		vertArray[vertIndex+7] 	= y4;
		texArray[texIndex+7] 	= region[7];
		
		colorArray[colIndex]  	= getColor(col.x);
		colorArray[colIndex+1] 	= getColor(col.y);
		colorArray[colIndex+2] 	= getColor(col.z);
		colorArray[colIndex+3] 	= getColor(col.w);
		
		colorArray[colIndex+4] 	=  getColor(col.x);
		colorArray[colIndex+5] 	=  getColor(col.y);
		colorArray[colIndex+6] 	=  getColor(col.z);
		colorArray[colIndex+7] 	=  getColor(col.w);
		
		colorArray[colIndex+8] 	=  getColor(col.x);
		colorArray[colIndex+9] 	=  getColor(col.y);
		colorArray[colIndex+10] =  getColor(col.z);
		colorArray[colIndex+11] =  getColor(col.w);
		
		colorArray[colIndex+12] =  getColor(col.x);
		colorArray[colIndex+13] =  getColor(col.y);
		colorArray[colIndex+14] =  getColor(col.z);
		colorArray[colIndex+15] =  getColor(col.w);
		
		
		vertIndex+=8;
		texIndex+=8;
		colIndex += 16;
		draws++; 
	}

Woh! Lots of stuff happening here. Lets explain. First we check if the texture is different from the one we are using, if it is, we render() and then set that as our texture. Then we will check to see if we have hit the max draw calls and again, if we have, render().

Now comes the fun part, rotation. If you would like you can skip this but I think you should read on.

We are going to render the quads with the center of the quad at the x and y given. This means that we need to divide the width/height by 2 and minus or subtract it depending on what corner of the quad we are specifying. We could just draw like we would in Java2D by using the x and y as the top left corner point but by making it the center we greatly simplify the stuff the user has to manage. Why are we not using x and y here? The coordinates are not in screen space because we are going to rotate them at the origin which we assume is (0,0). Now we will set up vars for our final coordinates after we rotate and translate into screen space. But WAIT!! what if we don’t need to rotate? Well we have the IF statement to check to see if we need to rotate, if not, we will just set the final coords to the p1x/p1y stuff and translate them into screen space by adding either x or y. Now for the rotation.

We will store the Sin and Cos of the degree so we only have to calculate these once. After that, we will use them as a rotation matrix.
R = Cos(degree) -Sin(degree)
Sin(degree) Cos(degree)
To see where I get the sin and cos multiplication and addition go here.
http://en.wikipedia.org/wiki/Rotation_matrix#In_two_dimensions :point:

Now we have rotated coordinates so we can translate them into screen space by adding either x to x coords and y to y coords.

Now that we have all the information we need we can fill up the arrays with the new data. We will use the index then add 1 to it for each subsequent placement into the array. The texture array is getting the tex coordinates from a float[] that is given in the method call. This is so we can specify only partial regions of a texture. (IE: texture atlas ). The we add the the same color for each corner of our quad. This is the method the getColor().


private byte getColor(float f)
	{
		return (byte) (f*255);
	}

Now that everything is filled up, we increase the indexes and add 1 to the draw count.

Here are some convenience methods for rendering if you don’t specify a color or float[].

public void draw(int texID, float x, float y, float sizex, float sizey )
	{
		draw(texID, empty, x, y, sizex, sizey, 0, empty1);
	}

	public void draw(int texID, float x, float y, float sizex, float sizey,float rotation, Vector4f col )
	{
		draw(texID, empty, x, y, sizex, sizey, rotation, col);
	}

And here is the whole class.

http://pastebin.java-gaming.org/f5814300d3e

This will work on just about every system out there. (even on mobile devices…although I don’t know why you would EVER not use libgdx)

Now if you want to improve even more for those many proud owners of a graphics card supporting opengl 3.0 or better, you can use geometry shaders which will take off even more stress from the cpu. I will add this some time in the future in such a way that you do not have to change any way you call stuff to be render from the SpriteBatcher. (Just plug it in and it works)

Quick performance specs:
On an integrate chip, you will get fillrate limited before you will come even close to cpu.
On a dedicated gpu, you will hit a cpu bottle neck first but is still much faster then fixed-function.

On my 6 year old computer: quad core @2.6ghz, GeForce 250 1Gig V-ram, 4 Gig ram (3 gig effective) can do 50k sprites at 60fps no problem.
On my 2 year old laptop: i5 2.8ghz, GeForce 420m :point: is never used, 4 Gig Ram can do 50k at 30fps on integrated chip :point: fillrate limited.

Have a nice day,
Stumpy ;D

Nice guide. :slight_smile: A few nitpicks:

[]You don’t need arrays unless you’re targeting Android; you might get better speeds with ByteBuffers directly.
[
]You can interleave the data to make things simpler and potentially more efficient.
[]It relies on deprecated code, like GL_QUADS and glTexCoordPointer. Might want to use shaders and custom attributes for a more modern approach.
[
]Since your color is passed as a Vector4f, you’d be better off using a FloatBuffer to store and specify your color attribute

I published a small library recently which demonstrates a bare-bones implementation of a SpriteBatcher using the programmable pipeline (i.e. custom matrices, shaders, attributes, etc). You can see it here:

Color should be a byte but I think internally it acts as a float. In my geometry batcher I ran into the same issue and am not sure if you would get much of a boost from using bytes vs floats.

Using byte buffers on my desktop and laptop performance drops when I use just the putFloat() in the actual drawSprite() code but gets the same result if I use a for loop calling putfloat from the float[]. I don’t know why.

Interleaving is more complex for most just entering this which was the target. Even if I interleaved it, I am not sure if there would be a substantial performance gain as interleaved. Interleaving seems more of something you do with VBO but then again you should eventually use VBO for something like this I suppose.

Yeah I still am not good at the “modern” way of doing things. Often, modern way is the more complex rout and you would still need to understand things such as quads/triangles to use it so again this is meant for people new to opengl.

You have to remember that what seems blatantly simple for experienced programs is often the part that new people do not understand. I think the greater problem for people is extrapolation off of simply tutorials. That is what I struggle with the most. So what if I can get a bunch of triangles to render on the screen with an interleave VBO. How the hell am I suppose to use this to replace all my glBegin glEnd stuff?