Faster tile drawing.

Mayhaw · May 11, 2013, 10:15pm

Hello Im programming a dungeon crawler in Slick2D and i have stumbled upon a big wall X.X Yes, it’s the FPS.
I am currently drawing (the visible part of) 4 tile layers + 1 lighting layer (there are also other things like gui & entity systems). I have a stable 90-120 FPS but considering the amount of logic that still needs to be added (AI,NPC’S,pets,combat system,buff system,spell system, particle effects etc. etc.) i need something around 500.
Now i have seen THIS AWESOME article http://www.java-gaming.org/topics/rendering-tiles-fast-isometric-too/25292/view.html but unfortunately my graphics card doesn’t support OpenGL 3.0 :C (everything is working besides the texture loading - OpenGL 3.0 ). Now my question is Is there any way to achieve the same performance-boost with OpenGL 2.0? If there is, could you please explain? Im kinda newbie to OpenGL.

-Alex

HeroesGraveDev · May 11, 2013, 10:30pm

If it’s not broken, don’t fix it.

Worry about it when/if you hit performance troubles.

Jimmt · May 11, 2013, 11:45pm

What is wrong with 90-120 fps? Seems reasonable, maybe it’s different on Android (never developed for it). You never know how much fps is going to drop after adding new features, so like HeroesGrave said, wait until you add most/all features and then see about performance problems.

davedes · May 11, 2013, 11:56pm

theagentd’s technique is fast but not always practical and requires texture arrays, which are only present in about 58% of cards. And since the tiles are drawn on GPU, it is not practical to add particular game-specific features like changing a tile when a player walks on it.

A faster technique than what Slick offers is to use VBOs or vertex arrays to batch the data. Slick includes a basic vertex array mode (see here) but it isn’t highly optimized. A better alternative would be to use a batcher i.e. from lwjgl-basics or to write your own.

Further; your “lighting layer” might require a significant amount of blending and fill rate.

FPS does not drop linearly with each new feature you add. You shouldn’t start worrying about FPS unless it drops under 60. Also note that small drops of high FPS should not be a huge concern, see here.

Ultimately if you are not happy with Slick’s performance, you should change libraries. Slick was not really designed for critical performance or memory usage (it uses glBegin/glEnd for christ’s sake!). Slick is dead tech and no longer maintained – nowadays LibGDX is a more powerful and more optimized alternative.

Mayhaw · May 12, 2013, 9:36am

Whoa, thanks for the replies! Unfortunately i didn’t gain any fps by changing things to what davedes said (changed renderer to Vertex Array Renderer, started using spritesheets) but that doesn’t bother me now because of the article davedes posted xd (thanks!). Oh and Jimmit, the game is not on Android

-Alex

PS: After looking at LibGDX, i think i will switch to it in the next project.

theagentd · May 12, 2013, 10:09am

I understand that the problem was somewhat resolved, but I’d like to point out that as I wrote in the article the fragment shader can be easily reworked to work on normal 2D textures too (at the cost of bilinear filtering and mipmaps) which would eliminate the need for texture arrays and therefore OpenGL 3 too.

I wouldn’t say that the OpenGL capabilities database is very accurate at telling how many people actually have those cards since they include multiple entries for different driver versions and cards that are older than I am. The percentage isn’t weighted by how many people actually have those cards. Instead I’d refer you to the Steam hardware survey which has much more accurate and up-to-date data (though biased towards gamers of course). There the number of OGL3 capable cards are 96.24% (XP is not a limit since OGL3 runs on XP too in contrast to DX10).

Updating a tile is far from expensive. I’d say it’s actually even cheaper than changing a single tile in a batched tile renderer since you’d have to regenerate a full chunk each time a single tile is changed. In contrast, with my algorithm you only have to upload some new texture data without any preprocessing at all. I wouldn’t worry about CPU-GPU memory performance. I have a particle renderer that upload 22.9MBs per frame at 65FPS, which is 1 487.73MBs/sec (Yes, almost 1.5GB/sec). Since each tile is stored as a 2-byte tile ID in the texture, 22.9MBs is enough to update 12 000 000 tiles per frame assuming the area that’s changed is continuous. You can’t upload the tiles one and one to the texture. If you have a batched tile renderer, you’d need to first generate a quad for each tile, then upload 4 vertices for each tile. Each vertex’ll need at least short precision positions and texture coordinates, so that’s 8 bytes per vertex times 4 vertices per tile. For 12 000 000 tiles that’s 366.2MBs plus the CPU overhead of generating that data which is a more realistic bottleneck. Plus, there’s no need for manual CPU culling of batches/tiles since the GPU takes care of that perfectly well for you.

In Mayhaw’s case, I’d recommend that you have 4 textures with tile data and sample all 4 in the same shader. That way you’ll only get 1 pass for the whole 4-layer map. You might even be able to sample the lighting layer too depending on how it works.

Mayhaw · May 12, 2013, 2:19pm

That’s on big ass post xd Well i think i get what you mean, but there is one small problem, some tiles are animated, some are not.And as for lighting, yes it’s possible because all i do is draw a map of (smaller thus more) colored tiles (mostly black) with different opacities. I started reading on the smapling stuff but i still dont know how to do it properly. Could you please show me how to do it explain it to me ? It would help me A LOT.

Alex

theagentd · May 12, 2013, 11:35pm

Exactly how are you rendering your tiles at the moment? And how are you animating them?

Mayhaw · May 13, 2013, 3:26pm

Well, i have a 3 dimensional array (layerswidthheight) witch contains 4 layers of tiles (each tiles holds 4 variables : an array of Strings “propeties”, a boolean “animated”, int speed [of the animation], and a array of tileSprite ID’s [i.e. 4,5,6] ). The render funtion loops throught the array and draws the tileSprites based on the tile’s current id (the tiles dont have any kind of time counter, the id is chosen by some simple algorithm). The tiles have also a function : returnWithNextID() witch returns the tile with it’s next id, if it has any (useful if you have for example 2 states of a door tile, opened and closed).

Here is the render function:
[s]


public void render(int x,int y,int startTileX,int startTileY,int width,int height,int layer,Graphics g){
		int twidth = getTileWidth();
		int theight = getTileHeight();
		long time = System.currentTimeMillis();
		Tile tile;
		int frame;
		
		//Reminder: to use this function call ImageBag.tileSheet.startUse(); before calling it and ImageBag.tileSheet.endUse(); after calling it.
		
		for(int tx = startTileX; tx < startTileX+width && x < layers[0].length;tx++){
			for(int ty = startTileY; ty < startTileY+height && y < layers[0][0].length;ty++){
				tile = getTile(tx,ty,layer);
				if(tile.animated){
					frame = (int)(  (time/(100*tile.speed))%tile.getAllTheIds().length);
					ImageBag.tiles[tile.getAllTheIds()[frame]].drawEmbedded(x+((tx-startTileX)*twidth), y+((ty-startTileY)*theight),Tile.WIDTH,Tile.HEIGHT);
				}else{
					ImageBag.tiles[tile.getId()].drawEmbedded(x+((tx-startTileX)*twidth), y+((ty-startTileY)*theight),Tile.WIDTH,Tile.HEIGHT);
				}
			}
		}
	}

[/s]

-Alex

PS: In the Map.class there is also a 2d array of booleans for collision detection (solid|passable).

theagentd · May 14, 2013, 3:32pm

Well, unless you’re drawing a shitload of tiles each frame, that probably won’t be worth optimizing with my algorithm, but if you want I can still help you implement it. There are only a few things that you’ll need to change to get it working with OpenGL 2.0.

You need to replace the 2D texture array I was using with a normal 2D texture containing the complete tileset containing all the tiles you’ll need for all layers. This should be easy. Just load in the tileset image you’re using and upload it to a 2D texture. You also have to disable bilinear filtering and mipmaps.
Here’s the tricky part. You need to modify the fragment shader to calculate texture coordinates for a given tile index. For that you need a new vec2 uniform variable containing how many column and rows the tileset image has. I’ll refer to that uniform as tilesetSize from now on (it’s the size in tiles, not pixels!).
The offset (in tiles) of a given tile index can be calculated like this:
[icode]x = index % columnsOfTileset,
y = index / columnsOfTileset; //Rounded down to the nearest int[/icode]
Then we need to add the local tile texture coordinates to that to get the correct pixel inside a tile. Since we can’t use filtering anyway, we can just calculate the local tile texture coordinates from the texture coordinates we passed using a simple fract(texCoords) like I originally did.

Now we end up with texture coordinates in tiles that are in the range (0 to columns, 0 to rows). To get normalized texture coordinates we have to divide them by the tilesetSize. The final GLSL code would look something like this:


uniform vec2 mapSize;
uniform vec2 tilesetSize;

uniform sampler2D tileTexture;
uniform sampler2D tilesetTexture;

void main()
{
   //The texture contains tile indices. Multiply by 65535 to convert the normalized values to shorts.
   float index = texture2D(tileTexture, gl_TexCoord[0].st / mapSize).rgb * 65535.0; 

   vec2 texCoords = (vec2(mod(index, tilesetSize.x), floor(index / tilesetSize.x)) + fract(gl_TexCoord[0].st)) / tilesetSize;
   
   gl_FragColor = texture2D(tilesetTexture, texCoords);
}

Extending that to handle 4 layers in one pass is easy. Just pack your 4 tile indices into an RGBA texture, and in the shader calculate 4 texture coordinates, sample all 4 tiles, blend the result together manually and write it out. Start by implementing the single layer version and I’ll help you (if you need it) with the 4-layer version later.