Rendering tiles fast (isometric too)

theagentd · December 14, 2011, 12:12pm

In a recent thread I mentioned/came up with a method of drawing tiles without needing a quad for each tile to avoid a vertex bottleneck by using a fragment shader. I’ve implemented it.

http://img140.imageshack.us/img140/7241/tilerendering1.png

The whole map in this test program is 2048x2048 tiles, and they are all drawn as a single quad (no culling on the CPU). Zooming out so that all tiles are visible, FPS drops to around 450-500 as the texture cache won’t be able to do its magic. However, in that case tiles are smaller than screen pixels, and the whole map just looks like randomly colored pixels.

The renderer is not limited to pure 2D rendering in any way. It’s possible to just rotate and scale the rendered quad to achieve lots of effects. For example, isometric tiles:

http://img194.imageshack.us/img194/506/tilerendering2.png

It uses an RGB texture to keep tile indices per tile and a 2D texture array for tiles to prevent bleeding between tiles when linear filtering is used. The most important part is the fragment shader:

#extension GL_EXT_texture_array : enable

uniform vec2 mapSize;

uniform sampler2D tileTexture;
uniform sampler2DArray tilesetTexture;

void main()
{
	vec2 texCoords = gl_TexCoord[0].st;
	vec2 tileResult = texture2D(tileTexture, texCoords / mapSize).rg;
	float tile = dot(tileResult, vec2(65535, 256.0));
	gl_FragColor = texture2DArray(tilesetTexture, vec3(fract(texCoords), tile));
}

which calculates the tile from 2 color channels and then looks up the tile in the texture array. This could be implemented in OpenGL 2 too by using a pure tile set, but it would have severe bleeding between bordering tiles when using filtering. However, mipmaps do not work as they produce weird 2 pixel seams between tiles for me.

Here is the Java and GLSL shader source, plus a test tileset: http://www.mediafire.com/?flc36t8uq6floaw You need the LWJGL jars in the classpath and the natives directory in the VM commands. It also needs OpenGL 3 to run, but it is possible to work around this.

I hope someone finds it useful!

sproingie · December 14, 2011, 6:04pm

VERY nice 8)

libtcod’s shader renderer uses this trick too, though your code is a lot easier to follow. The libtcod code uses some nasty tricks to deal with correcting for NPOT map dimensions, whereas I think just starting with a POT texture in the first place is a better idea (I guess it might get a bit more expensive if you had a 257x257 map, but eh, edge case).

For a true isometric view you really need a custom tileset designed for it, but it’s still nice to be able to pull off arbitrary scaling.

BoBear2681 · December 14, 2011, 7:09pm

That tileset looks familiar, but I can't place what game it's from. Care to enlighten me?

h3ckboy · December 14, 2011, 7:14pm

haha, to me it seems like Heroes of might and magic, but I strongly doubt that is it

aazimon · December 14, 2011, 7:24pm

Cool.
The tile set might be Final Fantasy. I see a little tent icon from the game.

theagentd · December 14, 2011, 7:36pm

Hahaha, the tileset is from Chrono Trigger. You can see parts of a few houses from the starting town.

I’m sure you can modify the texture loader and fragment shader to sample from a special isometric tile texture. =D

EDIT: Why would NPOT map dimensions be a problem? Textures have supported non-power of 2 textures since a long time ago.

R.D · December 14, 2011, 9:43pm

Nice! Will implement it as option in my next game for sure if I can use it as expected For my current project I just don’t have the time :o

I would like to see a test where I have a map and walk trough it with a camera or something!

Edit:
Argh, forget it, just saw that I can move around and such xD

sproingie · December 15, 2011, 1:36am

NPOT textures are indeed supported, but operations on them can be considerably slower. Maybe the libtcod guys were coding for nvidia 5xxx cards which claimed to implement GL2.0 but didn’t support NPOT textures, I dunno. At any rate, nothing that anyone need concern oneself over when already requiring GL 3.0.

theagentd · December 15, 2011, 9:38am

The only thing that actually requires GL3.0 is texture arrays. I was thinking of using them to prevent bleeding between tiles, especially when using mipmaps, but for some reason I get weird seams between tiles that seem to have a random color from the tile I’m drawing without any good reason, and it only happens when I enable mipmaps, regardless of if I use GL_LINEAR or GL_NEAREST. That means that you could get the exact same result by using a single huge 2D texture with the tileset instead of a 2D texture array as long as you eliminate bleeding by enabling GL_NEAREST or adding a border around each tile. The real problem is generating texture coordinates from the tile index in the fragment shader, and I suspect that performance could suffer a little because of this. Using a texture array, I can just pass in the tile index as an integer to the sampler and get the correct layer, which allows me determine the tile index and get the tile texture in only 3 lines.

Anyway, it’s nice to see people interested in this. =D

EDIT: Facepalm! There’s a small “bug” in the shader. It’s supposed to be float tile = dot(tileResult, vec2(65536, 256.0)), not 65535!!! It’ll probably round to the right number as lang as the index is under 32 000 though… xd

h3ckboy · December 15, 2011, 5:07pm

yeah, now that u said that I realize its true. wow I wasted soooo many hours on chrono trigger… haha.

ags1 · May 30, 2012, 5:17pm

Impressive. I can see many uses for this.

kruncher · June 10, 2012, 2:53pm

This is very interesting!

How would you go about having “void” tiles, i.e. holes where other geometry can be shown without the need for alpha blending?

sproingie · June 10, 2012, 3:10pm

Holes should be easily doable with a designated tile id meaning “void tile” where the fragment shader calls discard. The “blending” for an alpha of 0 is pretty trivial though, and I wouldn’t be surprised if it was just as fast.

theagentd · June 10, 2012, 3:49pm

Since there’s absolutely no overdraw anywhere and we’re not wasting any vertices, a simple transparent / black tile would work. Transparency / blending also works perfectly fine of course.

kruncher · June 10, 2012, 5:46pm

Thanks for the info!

theagentd · July 29, 2012, 5:27pm

Time to resurrect this thread once again! I’ve fixed the seam problem once and for all!

The cause (skip this if you don’t care)

There was actually two different problems causing this. The implementation above actually had a very minor problem with “seams” when scaling veeeery slowly, but it was very difficult to notice and only visible with subpixel scaling or translation. The other problem was triggered by using mipmaps. The generated texture coordinates wreaked havoc on OpenGL’s built-in LOD selection (which mip-level to use).

The universal seam problem

This was caused by me expecting floating point math to make sense. Rounding problems suck. There was a veeeeeery small chance that with extreme edge cases the texture filtering on the tile index lookup texture returned the index for one tile while the local texture coordinate generation calculated texture coordinates for a different tile.

http://img10.imageshack.us/img10/702/badytexturecoord.png

Notice how the white at the top of the tile also appears at the bottom seam of the tile. The tile index was gotten from the center tile, but the local texture coordinates were calculated for the tile below.

I solved this by simply storing the X and Y coordinates of each tile in the tile index texture too. That way the local texture coordinates will always be for the tile index fetched. Now the tile index texture is a GL_RGB16 texture. Tile indices are stored in the the red channel, and map X and Y is stored in the green and blue channels.

The mipmap seam problem

This was a lot harder to track down, but after working on a per-pixel distortion shader which also used dependent texture reads I realized the problem. It’s due to how OpenGL calculates which mip-level to sample from. Basically OpenGL calculates the mip level to use by checking how the texture coordinates change over 2x2 pixel area. This allows it to determine how quickly the texture coordinates change, and can then pick a mip level depending on the texture size. It also allows anisotropic filtering to work. However, it’s possible to confuse OpenGL into picking the wrong LOD value, and this is exactly what’s happening for my generated texture coordinates.

http://img341.imageshack.us/img341/2884/texturecoords.png

These are the local texture coordinates of each tile. The problem are the edges, because the texture coordinates aren’t continuous there. Since it checks the values over a 2x2 area the rate of change might be calculated over 2 or even 4 different tiles, each having vastly different values (one close to 1, one close 0). The result is that the shader samples from a very small mip level for edges, usually the smallest one. I solved this by calculating the LOD value on the CPU (very easy), sending this value as a uniform to the shader and sampling from the texture with texture2DArrayLod() with the precalculated LOD value.

Code

First, here’s the new fragment shader.

#extension GL_EXT_texture_array : enable

uniform vec2 mapSize;

uniform sampler2D tileTexture;
uniform sampler2DArray tilesetTexture;

uniform float lod;

void main()
{
	vec2 texCoords = gl_TexCoord[0].st;
	
	//tileResult contains (tileIndex, x, y). Multiply by 65535 to convert the normalized values to shorts.
	vec3 tileResult = texture2D(tileTexture, texCoords / mapSize).rgb * 65535.0; 
	
	//Extract...
	float tile = tileResult.r;
	vec2 tilePosition = tileResult.gb;
	
	gl_FragColor = texture2DArrayLod(tilesetTexture, vec3(texCoords - tilePosition, tile), lod);
}

The important Java code changes include:

The tile index texture now is a GL_RGB16 texture which contains (tileIndex, x, y) per tile. The level generation and single tile updating code has been updated.
Mip maps have been generated and enabled (the code was already there, just commented out).
Texture LOD is calculated and passed on to the tile renderer from Java. A GLSL uniform for this is updated each frame. LOD is calculated with the following code:

double size = Math.min(TILE_WIDTH, TILE_HEIGHT) / currentScale;
float lod = (float)(Math.log(size) / Math.log(2));
renderer.render(lod);

The renderer then supplies this to the shader before rendering each frame:

glUniform1f(lodLocation, lod);

Since MediaFire no longer likes me, I’ve uploaded the code to JGO’s pastebin:
Test Java program
Vertex shader
Fragment shader
Test tileset from Chrono Trigger

Performance

Mostly unchanged. 0.5 to 1.0 milliseconds for a fullscreen quad on mid-range hardware (1000 - 2000) FPS. Highest seen was just under 3 millisecond (370 FPS) for extremely zoomed out views (over 1 million tiles visible). Enabling mipmaps slightly improves performance for zoomed out views since smaller textures are used.
EDIT: I enabled SLI on my GTX 295 for the test program and ran it at 1920x1080 in fullscreen. On the default zoom level I got 3000 FPS and really scary whistling sound from my graphics card… High FPS = scary. o_O

Compatibility

I looked up the texture array extension, and it’s supported by OGL2 level AMD cards, but not NVidia cards. In other words, this program requires a DX9 AMD card or a DX10 Nvidia card = an AMD HD2000+ series card or an Nvidia 8000+ series card. It’s possible to ditch the texture array, but it requires some pretty big changes in the shader to pick out tiles directly from a normal 2D texture and it breaks mipmap and bilinear interpolation support since you’ll get bleeding between tiles. However, that would decrease the requirement to any card supporting shaders.

Congratulations! You just read an insanely long post!

TL;DR: Seams completely eliminated and mipmaps are now supported!

theagentd · October 9, 2012, 10:56pm

Sigh. Every time someone mentions this article I end up adding something new to it. I think this’ll be the last new feature I add though. Basically I added bilinear filtering between tiles. Check out these comparisons:

http://screenshotcomparison.com/comparison/150422

Note that there are two different comparisons (the little tab on the top left). The first one is 100% sharp bilinear interpolation, and looks like quad rendered tiles with 256x antialiasing. The second version imitates the filtering done by the GPU to make the whole tile world look like one continuous filtered image without any discontinuities between tiles, something that is impossible to achieve when drawing tiles with quads.

The new shader basically checks the closest 4 tiles instead of just one and does bilinear filtering between the 4 samples if the pixel is on the edge between them. This is all done in the shader of course. The only thing you need to add on the CPU side is that you need to set a shader uniform (“scale”) to how big a tile is onscreen in pixels. This was already available in the test program above since it’s basically the current zoom level (currentScale), so I just passed that in. This value should be clamped so it isn’t below 1, since when tiles cover less than 1 pixel it’s just going to be shimmering anyway. To achieve the super smooth version without any discontinuities just clamp the value to a maximum of . This all works with isometric tiles too. The result is perfect antialiasing and no shimmering when the camera moves or zooms. The difference is a LOT more noticeable in motion.

New test program : http://www.java-gaming.org/?action=pastebin&id=276 (very minor changes)
New fragment shader: http://www.java-gaming.org/?action=pastebin&id=277 (mostly rewritten, see the old one for more info on what everything does)

The obvious drawback: Doing everything four times. Performance is “worse” now. The following are values for a 1920x1080p screen at different zoom levels (each tile in the tileset is 16x16 pixels)

Normal vs Bilinear (FPS):

Each tile covers…
16x16 pixels: 2200 vs 1000
8x8 pixels: 2200 vs 900
4x4 pixels: 2000 vs 700
2x2 pixels: 1600 vs 750
1x1 pixels: 800 vs 290

Basically we went from 0.5 -1.0 ms to 1.0-3.5 ms. This kind of performance is still very usable, especially since 2D games are almost always CPU limited, so it shouldn’t even have an effect on FPS at all in a real game!