fastest way of rendering lots of textured cubes ?

I’m using right now opengl primitives like GL11.glVertex3f , but I’m pretty sure this is not the fastest way .

What would be the fastest alternative ?

In order of ascending difficulty:

Display lists:

  • Dead easy to use - you’ll just have to wrap your existing rendering code with glNewList and glEndList
  • Can’t be changed after they’re defined, so not great for constantly-changing geometry

Vertex Arrays

  • You’ll have to redo your rendering to pack vertex data into Buffers, which is then handed to OpenGL in one go
  • You send the data every frame which is fine for dynamic data, a bit wasteful for static

Vertex Buffer Objects

  • A soupçon of extra code over Vertex Arrays, this stores the vertex data on the graphics card so no wasteful transfers
  • VBOs are managed and updated much as you would update textures
  • I suspect there’s probably a tipping point where if you’re updating all data every frame, you don’t gain anything over VA. I have not investigated this

What kind of thing are you rendering?

Thanks, the man formerly known as bleb .

I’m rendering lots of textured cubes, apart from my md2 models.
I’m using Vertex Arrays to render the models, but not for the cubes . I must confess I just guessed that vertex arrays for hundreds of objects (cubes) would not be a good idea, but I think I’ll just give it a shot and benchmark it .

Just a warning - don’t treat FloatBuffer’s etc. like you would C arrays. They seem to be very very slow to put data into object-by-object. Instead, you probably want to keep one buffer and one Java array hanging around, copy everything one by one into the Java array, then bulk copy the whole array into the buffer. I actually found I had worse performance than drawing non-textured quads individually, so I dropped this code.

It isn’t so bad if you use FloatBuffer.put(int index, float value).
FloatBuffer.put(float) is almost an order of magnitude slower – last time I checked.

Hm, so if you track the index you would be inserting at (just do the same thing you’d do with an array) it’ll work a lot more quickly? That seems weird because it’s not like Sun couldn’t have just put their own count integer in there for doing this, but whatever…

I’m guessing:


//After I already put every object in array
buffer.put(array);

is faster than


//For every object
buffer.put(index, myJunk);

which is faster than


//For every object
buffer.put(myJunk);

Is this true, even with copying everything into a Java array before dumping it into the FloatBuffer? Or is it faster to do the FloatBuffer item-by-item as long as you know the position to put it?

Last time I tried it, in a real world scenario, not some benchmark, float[] was a lot faster than FloatBuffer (like 30%). The ‘hilarious’ part was that shoving everything in a float[] and doing a bulk put() on the FloatBuffer, took almost exactly the same amount of time as using put(int index, float value) everywhere :slight_smile:

It’s fine if you batch up objects into the same array. The point of vertex arrays is to minimize calls to openGL, so ram as many cubes into one array as you can. This is where texture atlases come in handy as you don’t need to split your rendering just to switch texture as often.

I’m confused by this answer. You say that putting it all in an array then copying it to the FloatBuffer is 30% faster, but then you say that doing put() with indices is the same speed?

I’m really curious about this. My personal experience seems to contradict the common wisdom that immediate mode sucks.

I’ve tried using vertex arrays and display lists several times while trying to optimize, and haven’t seen an appreciable improvement, and often times a decrease in performance as a result.

The latest example was a dynamic particle system - several thousand quads - where switching to vertex arrays was a tiny bit slower than immediate mode. Tested on several machines, with good/bad/atrocious video cards, and just didn’t see any improvement…

Doesn’t sound like VBOs would be much better either for this case, since the vertices aren’t static. It also seems like those would be more likely to perform poorly on a lower-end card, which is where you’d really need it - but that’s just guess with nothing to back it up.

Depends on how many triangles you are batching up. If you only draw one triangle then immediate mode will most likely be the fastest. At some point, could be 10, 100 or 1000 triangles, vertex arrays are faster.

Also you have to be geometry limited to see the difference. If you’re fillrate limited then it doesn’t matter that much what method you draw the triangles. Immediate mode might even have an advantage because it can start drawing faster.

You need to understand how VBOs work before you grok how they make things so much faster. It’ll just “click” when you realise. In the meantime, what should be the most efficient way of rendering dynamic geometry is probably a tradeoff between writing bits of data into float arrays and bulk putting, or writing data directly to a VBO, depending on how easy you make it for the VM to bounds check the putting. Memory bandwidth and the effects of cache pollution suggest that writing directly to VBOs should be much more efficient for larger datasets. This is how we use them at Puppygames anyway.

Cas :slight_smile:

Well, I ended up writing some test code to compare these.

It renders 100k 1x1 quads, in combinations from 1 batch of 100k to 10k batches of 10 quads, and prints out the frame rate every second. To toggle between those, press 1/2/3/4/5. To toggle between immediate mode/vertex arrays/dynamic VBO/static VBO, press space. For dynamic VBO, it will update all vertices every frame, while for static it’ll just send the data over once.

To add a single immediate-mode quad to the mix every frame, press Q (Q again to turn it off). The reason for that option is this thread.

Some results:


FPS: 30 (immediate mode, 100000 quads x1 iteration(s), without quad)
FPS: 30 (immediate mode, 100000 quads x1 iteration(s), without quad)
FPS: 30 (immediate mode, 100000 quads x1 iteration(s), without quad)
Switching to vertex arrays
FPS: 79 (vertex arrays, 100000 quads x1 iteration(s), without quad)
FPS: 81 (vertex arrays, 100000 quads x1 iteration(s), without quad)
FPS: 82 (vertex arrays, 100000 quads x1 iteration(s), without quad)
Switching to dynamic VBO
FPS: 65 (dynamic VBO, 100000 quads x1 iteration(s), without quad)
FPS: 65 (dynamic VBO, 100000 quads x1 iteration(s), without quad)
FPS: 65 (dynamic VBO, 100000 quads x1 iteration(s), without quad)
Switching to static VBO
FPS: 278 (static VBO, 100000 quads x1 iteration(s), without quad)
FPS: 276 (static VBO, 100000 quads x1 iteration(s), without quad)
FPS: 276 (static VBO, 100000 quads x1 iteration(s), without quad)

Switching to immediate mode
FPS: 23 (immediate mode, 100000 quads x1 iteration(s), without quad)
FPS: 29 (immediate mode, 10 quads x10000 iteration(s), without quad)
FPS: 28 (immediate mode, 10 quads x10000 iteration(s), without quad)
Switching to vertex arrays
FPS: 55 (vertex arrays, 10 quads x10000 iteration(s), without quad)
FPS: 55 (vertex arrays, 10 quads x10000 iteration(s), without quad)
FPS: 55 (vertex arrays, 10 quads x10000 iteration(s), without quad)
Switching to dynamic VBO
FPS: 16 (dynamic VBO, 10 quads x10000 iteration(s), without quad)
FPS: 17 (dynamic VBO, 10 quads x10000 iteration(s), without quad)
FPS: 16 (dynamic VBO, 10 quads x10000 iteration(s), without quad)
Switching to static VBO
FPS: 100 (static VBO, 10 quads x10000 iteration(s), without quad)
FPS: 99 (static VBO, 10 quads x10000 iteration(s), without quad)
FPS: 99 (static VBO, 10 quads x10000 iteration(s), without quad)
FPS: 102 (static VBO, 10 quads x10000 iteration(s), with quad)
FPS: 100 (static VBO, 10 quads x10000 iteration(s), with quad)
FPS: 100 (static VBO, 10 quads x10000 iteration(s), with quad)

Things that seem noteworthy:

  • adding an extra immediate-mode quad didn’t seem to mess up VBO performance
  • vertex arrays are faster than immediate mode at a fairly low number of vertices
  • dynamic VBOs seem slower than vertex arrays if you’re updating all vertices every frame
  • static VBOs are great. A bit of a duh there.

I wonder why I was seeing different results when comparing immediate mode to vertex arrays in a real-life particle system. I’ll go back to it again and see how it goes, perhaps I just mucked up something simple. Hmm.

Question about VBO - do you guys know roughly how common is it for a video card not to support it? It seems very useful, but if you need to include fallback code anyway, which will probably be used in a case where you really would need all the performance you can get (i.e., an old or just cheap/on-board video card), then it kind of defeats the purpose. On the other hand, the box I’m on right now has a crappy video card that does support it :smiley:

And here’s the code. Sorry for the double post, it didn’t quite fit into the 10k limit.


import java.nio.FloatBuffer;

import org.lwjgl.BufferUtils;
import org.lwjgl.Sys;
import org.lwjgl.input.Keyboard;
import org.lwjgl.input.Mouse;
import org.lwjgl.opengl.ARBVertexBufferObject;
import org.lwjgl.opengl.Display;
import org.lwjgl.opengl.DisplayMode;
import org.lwjgl.opengl.GL11;
import org.lwjgl.opengl.PixelFormat;

public class PerformanceTest {
	
	public static final int SCREEN_WIDTH = 640;
	public static final int SCREEN_HEIGHT = 480;
	
	public static class QuadData {
		float x1, y1, x2, y2; // lower left, upper right
	}
	
	public interface RenderTest {
		void init(int numQuads);
		void render();
	}
	
	public static class ImmediateTest implements RenderTest {
		private QuadData [] data;

		public ImmediateTest() { 
			init(10000); 
		}
		
		@Override
        	public void init(int numQuads) {
			data = generateQuads(numQuads);
        	}

		@Override
        	public void render() {
			GL11.glBegin(GL11.GL_QUADS);
			for (int i = 0; i < data.length; i++) {
    			GL11.glVertex2f(data[i].x1, data[i].y1);
    			GL11.glVertex2f(data[i].x1, data[i].y2);
    			GL11.glVertex2f(data[i].x2, data[i].y2);
    			GL11.glVertex2f(data[i].x2, data[i].y1);
			}
			GL11.glEnd();
        	}

		@Override
        	public String toString() {
	        	return "immediate mode";
        	}
		
	}
	
	public static class VertexArrayTest implements RenderTest {
		private QuadData [] data;
		private FloatBuffer vertices;
		public VertexArrayTest() { 
			init(10000); 
		}
		
		@Override
        	public void init(int numQuads) {
			data = generateQuads(numQuads);
			vertices = BufferUtils.createFloatBuffer(2 * 4 * numQuads);
        	}

		@Override
        	public void render() {
			vertices.clear();
			for (int i = 0; i < data.length; i++) {
				vertices.put(data[i].x1).put(data[i].y1);
				vertices.put(data[i].x1).put(data[i].y2);
				vertices.put(data[i].x2).put(data[i].y2);
				vertices.put(data[i].x2).put(data[i].y1);
			}
			vertices.flip();

			GL11.glEnableClientState(GL11.GL_VERTEX_ARRAY);
			GL11.glVertexPointer(2, 0, vertices);
			GL11.glDrawArrays(GL11.GL_QUADS, 0, vertices.limit()/2);
			GL11.glDisableClientState(GL11.GL_VERTEX_ARRAY);			
        	}
		
		@Override
        	public String toString() {
	        	return "vertex arrays";
        	}		
	}
	
	public static class VBOTest implements RenderTest {
		private QuadData [] data;
		private int id = -1;
		private FloatBuffer vertices;
		private boolean dynamic = false;
		public VBOTest(boolean dynamic) {
			this.dynamic = dynamic;
			init(10000); 
		}
		
		@Override
        	public void init(int numQuads) {
			data = generateQuads(numQuads);
			if (id != -1) ARBVertexBufferObject.glDeleteBuffersARB(id);
			id = ARBVertexBufferObject.glGenBuffersARB();
			vertices = BufferUtils.createFloatBuffer(2 * 4 * numQuads);
			updateVBO();
        	}
		
		private void updateVBO() {
			vertices.clear();
			for (int i = 0; i < data.length; i++) {
				vertices.put(data[i].x1).put(data[i].y1);
				vertices.put(data[i].x1).put(data[i].y2);
				vertices.put(data[i].x2).put(data[i].y2);
				vertices.put(data[i].x2).put(data[i].y1);
			}
			vertices.flip();
			ARBVertexBufferObject.glBindBufferARB(ARBVertexBufferObject.GL_ARRAY_BUFFER_ARB, id);
			int mode = ARBVertexBufferObject.GL_DYNAMIC_DRAW_ARB;
			if (!dynamic) mode = ARBVertexBufferObject.GL_STATIC_DRAW_ARB;
			ARBVertexBufferObject.glBufferDataARB(ARBVertexBufferObject.GL_ARRAY_BUFFER_ARB,
                                				  vertices,
                                				  mode);
			ARBVertexBufferObject.glBindBufferARB(ARBVertexBufferObject.GL_ARRAY_BUFFER_ARB, 0);
		}

		@Override
        	public void render() {
			if (dynamic) {
				updateVBO();
			}
			ARBVertexBufferObject.glBindBufferARB(ARBVertexBufferObject.GL_ARRAY_BUFFER_ARB, id);
			GL11.glEnableClientState(GL11.GL_VERTEX_ARRAY);
			GL11.glVertexPointer(2, GL11.GL_FLOAT, 0, 0);
			GL11.glDrawArrays(GL11.GL_QUADS, 0, vertices.limit()/2);
			GL11.glDisableClientState(GL11.GL_VERTEX_ARRAY);
			ARBVertexBufferObject.glBindBufferARB(ARBVertexBufferObject.GL_ARRAY_BUFFER_ARB, 0);
        	}
		
		@Override
        	public String toString() {
	        	return dynamic ? "dynamic VBO" : "static VBO";
        	}		
	}	
	
	public static QuadData [] generateQuads(int num) {
		QuadData [] data = new QuadData[num];
		for (int i = 0; i < data.length; i++) {
			data[i] = new QuadData();
			data[i].x1 = (float)Math.random() * SCREEN_WIDTH;
			data[i].y1 = (float)Math.random() * SCREEN_WIDTH;
			data[i].x2 = data[i].x1 + 1; // keep quads small
			data[i].y2 = data[i].y1 + 1; // to keep fillrate low 
		}
		return data;
	}
	
	private static RenderTest currTest;
	private static int iterations;
	private static int num;
	private static boolean renderQuad;
	public static void main(String[] args) throws Exception {
		setDisplayMode();
		Display.setTitle("PerformanceTest");
		Display.setFullscreen(false);
		Display.setVSyncEnabled(false);
		Display.create(new PixelFormat(32, 0, 24, 8, 0));
		Mouse.setGrabbed(false);

		String extensions = GL11.glGetString(GL11.GL_EXTENSIONS);
		if (!extensions.contains("GL_ARB_vertex_buffer_object")) {
			System.out.println("GL_ARB_vertex_buffer_object not available");
		}
		
		GL11.glDisable(GL11.GL_TEXTURE_2D);
		GL11.glDisable(GL11.GL_DEPTH_TEST);
		GL11.glDisable(GL11.GL_LIGHTING);

		GL11.glMatrixMode(GL11.GL_PROJECTION);
		GL11.glPushMatrix();
		GL11.glOrtho(0, SCREEN_WIDTH, 0, SCREEN_HEIGHT, -1, 1);

		RenderTest [] tests = new RenderTest [] { new ImmediateTest(), 
												  new VertexArrayTest(),
												  new VBOTest(true),
												  new VBOTest(false),
												};
		int testIndex = 0;
		renderQuad = false;
		currTest = tests[testIndex];
		
		prev = getTime();
		iterations = 1;
		num = 100000;
		
		for (RenderTest test : tests) {
	        	test.init(num);
        	}
		GL11.glMatrixMode(GL11.GL_MODELVIEW);
		GL11.glColor4ub((byte)55, (byte)55, (byte)55, (byte)255);
		while (true) {
			GL11.glPushMatrix();
			GL11.glLoadIdentity();
			GL11.glClear(GL11.GL_COLOR_BUFFER_BIT);
			
			if (renderQuad) {
    			GL11.glBegin(GL11.GL_QUADS);
				GL11.glVertex2f(10, 10);
				GL11.glVertex2f(10, 20);
				GL11.glVertex2f(20, 20);
				GL11.glVertex2f(20, 10);
    			GL11.glEnd();
			}
			
			for (int i = 0; i < iterations; i++) {
				currTest.render();
			}
			
			GL11.glPopMatrix();
			
			if (Display.isCloseRequested() || Keyboard.isKeyDown(Keyboard.KEY_ESCAPE)) {
				break;
			}
			Display.update();
			
			updateFPS();
			
			// process input to switch tests etc
			boolean reinit = false;
			Keyboard.poll();
			while (Keyboard.next()) {
				if (Keyboard.getEventKeyState() == true) {
					switch (Keyboard.getEventKey()) {
						case Keyboard.KEY_SPACE:
							testIndex = (testIndex + 1) % tests.length;
							currTest = tests[testIndex];
							resetFPS();
							System.out.println("Switching to " + currTest);
							break;
						case Keyboard.KEY_Q:
							renderQuad = !renderQuad;
							resetFPS();
							break;
						case Keyboard.KEY_1:
							iterations = 1;
							num = 100000;
							reinit = true;
							break;
						case Keyboard.KEY_2:
							iterations = 10;
							num = 10000;
							reinit = true;
							break;
						case Keyboard.KEY_3:
							iterations = 100;
							num = 1000;
							reinit = true;
							break;
						case Keyboard.KEY_4:
							iterations = 1000;
							num = 100;
							reinit = true;
							break;							
						case Keyboard.KEY_5:
							iterations = 10000;
							num = 10;
							reinit = true;
							break;							
					}
				}
			}
			if (reinit) {
    				for (RenderTest test : tests) {
    		        		test.init(num);
    	        		}
    				resetFPS();
			}
		}
		GL11.glPopMatrix();
	}
	
	private static long prev;
	private static long elapsed = 0;
	private static int frames = 0;
	private static void updateFPS() {
		// update FPS, print out every second
		frames++;
		long now = getTime();
		elapsed += (now - prev);
		prev = now;
		if (elapsed > 1000) {
			String extra = renderQuad ? "with quad" : "without quad";
			System.out.println("FPS: " + frames + " (" + currTest +
							   ", " + num + " quads x" + iterations +" iteration(s), " + extra + ")");
			elapsed -= 1000;
			frames = 0;
		}		
	}
	
	private static void resetFPS() {
		elapsed = 0;
		frames = 0;
	}
	
	private static long getTime() {
		return (long)((double)Sys.getTime() / (double)Sys.getTimerResolution() * 1000.0);
	}
	
	private static void setDisplayMode() throws Exception
	{
		DisplayMode[] dm = org.lwjgl.util.Display.getAvailableDisplayModes(SCREEN_WIDTH, SCREEN_HEIGHT, -1, -1, -1, -1, 60, 60);
		org.lwjgl.util.Display.setDisplayMode(dm, new String[] {
				"width=" + SCREEN_WIDTH,
				"height=" + SCREEN_HEIGHT,
				"freq=" + 60,
				"bpp=" + org.lwjgl.opengl.Display.getDisplayMode().getBitsPerPixel()
         });
	}
}

textured cubes… you’re working on minecraft aren’t you?