[GLSL] Terrain multitexturing shader is too slow

Hey guys,
I came up with a terrain multitexturing technique that works great but it doesn’t run fast enough on intel / older graphics cards. I don’t know if there’s anything I can really do about this. My game runs at 60fps on my ATI card but will hover at around 8-10fps on my intel HD 4000. I was wondering if anyone could look at my technique / GLSL code and see if there’s anything significant I could do to improve it?
My other solution is to just have an option to disable multitexturing on slower PCs, where they would instead just use a single texture with coloured polygons. This kind of sucks because the map/level will look a lot worse but maybe it’s the only other way.

Because of some PC’s only supporting 4 texture units, instead I am using a 2048x2048 “virtual texture” (See image below)

This allows me to have as many textures as I want (up to 32) as long as they fit inside the virtual texture (as “tiles” like in a spritesheet), with a maximum of 4 textures blended per pixel.

My technique requires 4 textures.

  1. Colour map (just for painting colours on the terrain, omitted from the code)
  2. Virtual texture (described above)
  3. Tile map (contains 4 tiles for each pixel) where R=0 is tile 0, R = 1 is tile 1, up to 32. EG. RGBA=(0,1,2,3) will be the first 4 tiles
  4. Mix map (contains opacity of the 4 tile textures for each pixel, for blending them together)

I also pass an array of tiles (x,y,width,height) so that I can calculate the location on the virtual texture for each pixel.

I send 2 sets of UV coords to the shader, worldUV is the true polygon UV coordinates, mixmapUV is the polygon’s UV coordinates relative to the map bounds (from 0 to 1).

I multiply the UV coordinates by the tile size so that smaller textures don’t repeat more than larger (higher quality) textures.

If something isn’t clear please let me know! :slight_smile:


uniform vec4 tiles[MAX_MAP_TEXTURES]; //contains the x,y,width,height of each tile in the virtual texture
uniform sampler2D tileTexture; //contains 4 texture indexes used to get the texture tile from the virtual texture
uniform sampler2D virtualTexture; //2048x2048 packed texture containing all individual terrain textures
uniform sampler2D mixmapTexture; //contains opacity to blend a maximum of 4 textures together

vec4 calculateColour(vec2 worldUV, vec2 mixmapUV)
{
	vec4 textureLocations = texture2D(tileTexture, mixmapUV);
	vec4 mixmapColour = texture2D(mixmapTexture, mixmapUV);
	
	vec4 finalColour = vec4(0.0,0.0,0.0,1.0);
	for(int i=0;i<4;i++)
	{
		int tileLoc = int(textureLocations[i]*256.0);
		
		vec4 tile = tiles[tileLoc];
		float minTextureSize = min(tile.p, tile.q);
		vec2 wrapped = vec2(tile.s + abs(mod(worldUV.x * minTextureSize, tile.p)), tile.t + abs(mod(worldUV.y * minTextureSize, tile.q)));
		textureColours[i] = texture2D(virtualTexture, wrapped);
		
		finalColour.rgb = mix(finalColour.rgb, textureColours[i].rgb, mixmapColour[i]);
	}
	
	return finalColour;
}

Try unwinding your loop and see if that helps performance, I’ll look for an article I read that cautions you to stay away from things like:

int(textureLocations[b][i]*256.0[/b]);

OMG. Thanks so much thedanisaur!

I thought it was just when the loop amount was variable that it caused a big slowdown.
Eg. for (int i = 0; i < someVariable; i++).

I removed the loop and set i to 0 like so:


int i = 0;
//for(int i=0;i<4;i++)
{
	int tileLoc = int(textureLocations[i]*256.0);
		
	vec4 tile = tiles[tileLoc];
	float minTextureSize = min(tile.p, tile.q);
	vec2 wrapped = vec2(tile.s + abs(mod(worldUV.x * minTextureSize, tile.p)), tile.t + abs(mod(worldUV.y * minTextureSize, tile.q)));
	textureColours[i] = texture2D(virtualTexture, wrapped);
		
	finalColour.rgb = mix(finalColour.rgb, textureColours[i].rgb, mixmapColour[i]);
}


This caused the code to run at ~30fps. Definitely not fast enough since I was only doing 1/4 of the work (So no speed up at all)

I removed the int i = 0; variable completely and replaced i with 0 in all places.

Boom 60FPS!

So now I have a big block of duplicated code but it runs at 60fps no problems! :slight_smile: :slight_smile: :slight_smile:
I guess I can move some things around and put some code in a separate function so there isn’t so much duplication.


	{
		int tileLoc = int(textureLocations[0]*256.0);
		
		vec4 tile = tiles[tileLoc];
		float minTextureSize = min(tile.p, tile.q);
		vec2 wrapped = vec2(tile.s + abs(mod(worldUV.x * minTextureSize, tile.p)), tile.t + abs(mod(worldUV.y * minTextureSize, tile.q)));
		textureColours[0] = texture2D(virtualTexture, wrapped);
		
		finalColour.rgb = mix(finalColour.rgb, textureColours[0].rgb, mixmapColour[0]);
	}
	{
		int tileLoc = int(textureLocations[1]*256.0);
		
		vec4 tile = tiles[tileLoc];
		float minTextureSize = min(tile.p, tile.q);
		vec2 wrapped = vec2(tile.s + abs(mod(worldUV.x * minTextureSize, tile.p)), tile.t + abs(mod(worldUV.y * minTextureSize, tile.q)));
		textureColours[1] = texture2D(virtualTexture, wrapped);
		
		finalColour.rgb = mix(finalColour.rgb, textureColours[1].rgb, mixmapColour[1]);
	}
	{
		int tileLoc = int(textureLocations[2]*256.0);
		
		vec4 tile = tiles[tileLoc];
		float minTextureSize = min(tile.p, tile.q);
		vec2 wrapped = vec2(tile.s + abs(mod(worldUV.x * minTextureSize, tile.p)), tile.t + abs(mod(worldUV.y * minTextureSize, tile.q)));
		textureColours[2] = texture2D(virtualTexture, wrapped);
		
		finalColour.rgb = mix(finalColour.rgb, textureColours[2].rgb, mixmapColour[2]);
	}
	{
		int tileLoc = int(textureLocations[3]*256.0);
		
		vec4 tile = tiles[tileLoc];
		float minTextureSize = min(tile.p, tile.q);
		vec2 wrapped = vec2(tile.s + abs(mod(worldUV.x * minTextureSize, tile.p)), tile.t + abs(mod(worldUV.y * minTextureSize, tile.q)));
		textureColours[3] = texture2D(virtualTexture, wrapped);
		
		finalColour.rgb = mix(finalColour.rgb, textureColours[3].rgb, mixmapColour[3]);
	}

Cool! Happy to help.

I don’t remember all the details and can’t find the article(branching vs discard or max sampler2d? idk), anyway apparently frag shaders hate dynamic indexing.

That’s ok. I’ve definitely learned my lesson anyway :slight_smile:
Thanks again

How does performance goes if you replace uniform array with 1d texture LUT. This way you get away from all integer based dynamic indexing.

Replace uniform vec4 tiles[MAX_MAP_TEXTURES]; with a 1d texture? and replace int tileLoc = int(textureLocations[ i ]*256.0); with just having the x, y, width, height in the 1d texture?

Or instead of using the tile texture to store lookups to the array, I use that to store the tiles somehow in the tile texture and use the 1d array to get it from there?

I can see it somehow working and being faster, but it’s just out of reach :slight_smile: Could you give me some more details?
I want to stick with 4 textures maximum (colourmap, virtual texture, tilemap and mixmap) although the alpha channel of the colourmap isn’t being used.
Thanks

I meant textureLocations uniform array.

textureLocations is a vec4

Ofc I meant uniform vec4 tiles.

I still don’t understand how to do this. :’(

He’s saying replace the uniform vec4 with a 1D (an array) lookup table to remove dynamic indexing into your textrues, the [i]256.0 or 0256.0 1*256.0…etc… This would (I think) allow for more that four (scalability) virtual textures and reduce the amount of code you have.