Tutorial: stutter-free texture streaming with LWJGL

Introduction

Texture streaming is a very common feature in 3D engines today, and for good reasons. Texture streaming can provide some clear advantages for AAA games which have massive amounts of textures to deal with. The two most well-known advantages of streaming are:

  • A streaming system can automatically keep only the necessary textures in memory to minimize the minimum VRAM requirements of any given scene, while still being able to utilize any left over VRAM as a cache by only unloading textures when necessary.
  • Reduced load times since the game can be played despite that the textures are still being loaded in.

Why should I care? I’m not an AAA developer!

It’s obvious that these two reasons alone are a good reason for a game with 20GBs of texture data to deal with, but admittedly indie games have much less resources to invest in texture variety. However, there are some less well-known advantages of streaming that can help you regardless of how many or how big your textures are. For example, reduced load times aren’t a big deal if you only need to load 10 different textures per in-game level, but what you might be overlooking is how texture streaming simplifies this loading. Texture streaming completely eliminates any manual texture management. Loading a new level? No need to check which textures are needed! Just change the level and your streaming system will automatically unload textures that are no longer used and stream in all required textures! Simply getting rid of manual texture management probably requires less code than implementing a streaming system!

A first attempt

Let’s start by “implementing” a simple streamer. We’ll basically make a small class holding a File pointing to an image file and an OpenGL texture ID. Each texture will have a bind() function which loads a texture if necessary before binding it as usual. It also keeps track of when the texture was last used so we know when we can unload the texture.


import java.awt.image.BufferedImage;
import java.io.File;
import java.io.IOException;
import java.nio.ByteBuffer;

import javax.imageio.ImageIO;

import static org.lwjgl.opengl.GL11.*;
import static org.lwjgl.opengl.GL30.*;

public class StreamedTexture {
	
	private static final int UNLOAD_TIME = 10_000; //in milliseconds, so 10 seconds
	
	private File file;
	private int internalFormat, format;
	
	private int textureID;
	private long lastUsed;
	
	public StreamedTexture(File file, int internalFormat, int format) {
		this.file = file;
		this.internalFormat = internalFormat;
		this.format = format;
		
		textureID = -1;
	}
	
	public void bind(){
		if(textureID == -1){
			//Texture isn't loaded, so we need to load it.
			loadTexture();
		}
		glBindTexture(GL_TEXTURE_2D, textureID);
		lastUsed = System.currentTimeMillis();
	}

	private void loadTexture() {
		
		try {
			BufferedImage image = ImageIO.read(file);
			
			ByteBuffer textureData = ...; //Create byte buffer with pixel data
			textureData.flip();
			
			textureID = glGenTextures();
			glBindTexture(GL_TEXTURE_2D, textureID);
			glTexImage2D(GL_TEXTURE_2D, 0, internalFormat, image.getWidth(), image.getHeight(), 0, format, GL_UNSIGNED_BYTE, textureData);
			glGenerateMipmap(GL_TEXTURE_2D);
			
			//Set up filtering, wrap mode, etc...
			
		} catch (IOException e) {
			e.printStackTrace();
		}
	}
	
	public void unloadIfOld(){
		if(textureID != -1 && System.currentTimeMillis() - lastUsed > UNLOAD_TIME){
			glDeleteTextures(textureID);
			textureID = -1;
		}
	}
}

Use:
When the game is first started we create a StreamedTexture object for every single one of our texture files. This doesn’t actually read the texture files, so keeping these objects in memory 24/7 is not a problem at all. Textures are only loaded when the texture is first bound. unloadIfOld() is called on all textures at even intervals (once per frame or once per second or so) to find and unload textures that haven’t been used for some time.

How well did we do? Well, we’ve managed to simplify our texture management significantly, but at what cost? We have numerous big problems that (may) need to be solved.

    1. Whenever a texture isn’t already in memory, the game freezes up until the texture has been loaded. When a new level is loaded, numerous textures might be missing. We’ve essentially just automated our texture loading, but we’re not actually streaming in data!
    1. We’re loading in images using ImageIO, which generates a huge temporary BufferedImage object. Then we generate a second huge ByteBuffer for our texture’s pixel data. The huge amount of garbage generated isn’t exactly a good if we want to avoid GC pauses.
    1. glGenerateMipmap() is a very heavy command which downsamples the texture multiple times on the GPU. We can get rid of the CPU related freezes pretty easily, but if we want to get rid of all stutter, this method has to go.

How do we solve these problems then? The solutions are pretty intertwined.

    1. As always when you want to do something asynchronous, we’re going to use a separate thread for loading textures from files. We can’t afford waiting for a slow hard drive to read data, then uncompress the image file and pack it into a ByteBuffer. However, loading in the texture asynchronously means that we don’t have anything to display while the texture is still being loaded. A generic black, gray or white texture isn’t exactly pleasing to look at.
    1. To avoid having to use ImageIO, we can develop our own image file format.
    1. If we have our own image format, we can precompute all mipmap levels and store them in the file so we don’t have to recompute them each time we load the texture. This has an added advantage of allowing us to quickly load in the lower mipmap levels.

Multithreaded OpenGL?!

Yes, it’s doable! It’s actually not even hard to do! The problem is that having multiple OpenGL contexts can be slow due to the synchronization the driver does behind the scene, generally leading to a big loss of performance. However, one of the main reasons that multiple OpenGL contexts exist is for the purpose of doing stutterless texture streaming, and drivers have been optimized extensively for this purpose.

We need to set up a shared OpenGL context on a separate texture streaming thread which will create and load textures for the main game thread. Using LWJGL, it’s extremely easy to set up a shared context in the run() method of your streaming thread:


	public void run(){
		SharedDrawable drawable = null;
		try{
			drawable = new SharedDrawable(Display.getDrawable());
			drawable.makeCurrent();
			
			//Call OpenGL functions! Woooh!
			
			drawable.releaseContext();
			
		}catch(LWJGLException ex){
			ex.printStackTrace();
			exception = ex;
		}finally{
			if(drawable != null){
				drawable.destroy();
			}
		}
	}

The cleanup code is there for reference, but isn’t actually required in our case since the streaming thread will loop forever waiting for streaming jobs to process.

I mentioned above that a problem when doing asynchronous streaming of textures is that while the texture is being loaded we have nothing to display. To solve this, I added a second texture handle each StreamedTexture object which contains the 4x4 mipmap of the texture, which is loaded when the game starts and is permanently kept in memory. The permanent texture can then be used until the full texture has been loaded. The permanent 4x4 texture uses only 64 bytes of VRAM per texture, which is nothing to worry about.

PBOs?

To further reduce the stuttering of texture loading, we should be using a Pixel Buffer Object (PBO). A PBO is kind of a VBO but filled with texture data instead of vertex data. The advantage of using them is that we can map them directly for better performance, and the PBO can be asynchronously uploaded by the driver to the GPU’s VRAM which avoids stuttering. Setting up a PBO is extremely easy:


		pbo = glGenBuffers();
		glBindBuffer(GL_PIXEL_UNPACK_BUFFER, pbo);
		glBufferData(GL_PIXEL_UNPACK_BUFFER, pboSize, GL_STREAM_COPY); //Upload empty buffer to prevent stall later
		glBindBuffer(GL_PIXEL_UNPACK_BUFFER, 0);

It can then be mapped just as any other buffer with glMapBuffer(). If your textures are large, you may require an extremely big PBO. For example, a 4096x4096 RGBA texture requires 64MBs of memory. If your textures are that large, you may want to allocate a significantly smaller PBO, say 16x4096 bytes, and load in the texture with glTexSubImage2D() four rows at a time.

Designing a file format

Although there are file formats designed specifically for games like DDS that support both compression and mipmaps, I decided to roll my own format. Your binary file format for your texture images can be as simple or as complicated as you want. My format ended up being pretty simple. First I have a small header which contains how the number of mipmap levels in the file, their sizes and their byte offsets in the file. After that I simply dump the raw uncompressed texture data for each mipmap in order. When the game is started, the header is read and stored in the StreamedTexture object. When the texture needs to be loaded, I can quickly load in a certain mipmap without having to go through the whole file by skipping directly to the byte offset of that mipmap. A big advantage here is that you can easily implement texture compression here. Simply compress the texture using OpenGL and dump the precompressed data to the file instead. Then load it with glCompressedTexSubImage() instead of glTexSubImage2D(). Creating a converter and a loader for your file format is probably the most time consuming part of the whole progress, but eliminating glGenerateMipmap() is a must unless you’re okay with >100ms freezes!

Putting it all together

Our original StreamedTexture class is a good starting point for the above improvements.

  • We need to load in the permanent 4x4 texture when the StreamedTexture is created.
  • We need to modify the bind() method to instead of loading the texture directly it should off-load the loading to our streaming thread.
  • We need to add a variable which tracks the current state of the texture: Unloaded, being loaded and loaded.

Here’s some pseudo code for such a StreamedTexture class:


public class StreamedTexture {

	private static final int STATE_UNLOADED = 0;
	private static final int STATE_LOADING = 1;
	private static final int STATE_LOADED = 2;

	private volatile int state;
	
	public StreamedTexture(File file) {
		loadHeader(file);
		loadPermanent4x4Texture(file);
		state = STATE_UNLOADED;
	}
	
	public void bind(){
		if(state == STATE_UNLOADED){
			state = STATE_LOADING;
			queueStreamingJob(new LoadJob());
		}
		if(state == STATE_LOADED){
			bindLoadedTexture();
		}else{
			bindPermanent4x4Texture();
		}
		lastUsed = System.currentTimeMillis();
	}
	
	public void unloadIfOld(){
		if(state == STATE_LOADED && System.currentTimeMillis() - lastUsed > UNLOAD_TIME){
			deleteLoadedTexture();
			state = STATE_UNLOADED;
		}
	}


	private class LoadJob extends StreamingJob{

		public void process(int pbo, ByteBuffer temporaryByteBuffer){
			createAndAllocateTexture(width, height); //Already read from header in constructor
			for(int i = lastMipMapLevel; i >= 0; i--){
				openTextureFile(file);
				skipHeader();
				skipToMipMap(i);
				readMipMapToBuffer(temporaryBuffer);
				uploadToPBO(pbo, temporaryBuffer);
				glTexImage2D(...); //Copy from PBO to texture
			}
			glFinish();
			state = STATE_LOADED;
		}
	}
}

Pseudo code for a streaming thread’s run() method:


	private PriorityBlockingQueue<StreamingJob> queue;

	...

	public void run() {
		
		createOpenGLSharedDrawable();
		createPBO();
		createTemporaryByteBuffer();
		
		while(true){
			StreamingJob job = null;
			try{
				job = queue.take();
				job.process(pbo, temporaryByteBuffer);
			}catch(InterruptedException ex){
				continue;
			}
		}
	}

I’ll leave creating a file format as an exercise to the reader. If you have any questions, or if something is unclear and needs more explaining please post your questions and I will try to correct the article.

Screenshot of texture streaming in action! The spikes are only around 1-2ms and they are pretty much unavoidable as they are caused by the driver allocating memory for the textures on the GPU.

That’s interesting to read!
Thanks for that tutorial :slight_smile: