Slick Textures awefully slow

badlogicgames · November 24, 2012, 11:30pm

@theagentd, that’s a really neat scheme you have going there. What machine did you bench it on?

Riven · November 24, 2012, 11:46pm

[quote]array2[x1616 + y*16 + z]
[/quote]
That access pattern causes horrible cache trashing and thus poor performance.

http://www.java-gaming.org/topics/minecraft/20505/msg/168294/view.html#msg168294 (algorithm)
http://www.java-gaming.org/topics/minecraft/20505/msg/168314/view.html#msg168314 (benchmark)

(~3x speedup purely by choosing an intelligent, cache aware memory layout within the array. needless to say i was disappointed by the lack of any feedback :emo:)

theagentd · November 25, 2012, 1:19am

That access pattern causes horrible cache trashing and thus poor performance.

http://www.java-gaming.org/topics/minecraft/20505/msg/168294/view.html#msg168294 (algorithm)
http://www.java-gaming.org/topics/minecraft/20505/msg/168314/view.html#msg168314 (benchmark)

(~3x speedup purely by choosing an intelligent, cache aware memory layout within the array. needless to say i was disappointed by the lack of any feedback :emo:)
[/quote]
Wow, I didn’t even think about that! I’m in luck though, my data blocks are all 8x8x8. =D Very interesting though, I’ll try to keep that in mind since I have code that might suffer from that.

Not a mainstream PC. An i7 860 @ 3.52GHz and a GTX 295 (both GPUs in AFR). The performance is almost the same no matter the size of the world since it’s mostly GPU limited (around 850 000 vertices for my terrain). If I make the terrain hillier there’s a lot more geometry, so performance takes a hit even though the render block sizes are reduced automatically to improve culling precision.

skappler · November 25, 2012, 11:26am

That access pattern causes horrible cache trashing and thus poor performance.

http://www.java-gaming.org/topics/minecraft/20505/msg/168294/view.html#msg168294 (algorithm)
http://www.java-gaming.org/topics/minecraft/20505/msg/168314/view.html#msg168314 (benchmark)

(~3x speedup purely by choosing an intelligent, cache aware memory layout within the array. needless to say i was disappointed by the lack of any feedback :emo:)
[/quote]
Looks good. Even though I don’t understand it yet
But I know that you can do many things more efficient if you use bit operations. I’ll take a deeper look into it and try to understand it. Thank you!

matheus23 · November 25, 2012, 1:20pm

Heh… probably because no one actually understood this… me including

Could you please explain what you are doing there?

theagentd · November 25, 2012, 2:46pm

He’s generating indices in a way that improves cache coherence when reading neighbors. A 16x16x16 chunk is too large to fit into the CPU cache, so the same data is read multiple times from RAM into the cache and then immediately discarded when the next distant (for the CPU) neighbor is needed, sometimes for each block. In my code, getting x-1 would get a neighbor that 256 slots to the left in the array. That isn’t very good since there’s a big chance that that part of the array has been removed from the cache, and the CPU has to wait for it to be loaded from RAM which is a lot slower than the CPU cache. Like Riven showed in the benchmark, you get around 3x performance by using a more cache friendly way of generating indices into the array. If I got his code right, he splits up each chunk into 8x8x8 “sub-chunks” and store these in the array instead. That’s much better since the only time we might get a cache miss is when sampling over “sub-chunk” borders instead of for every single block.

But Riven, I think Marcus Persson was focusing on other things at the time. Those chunks were loaded from disk or downloaded from a server, so improving the CPU performance of generating the geometry probably wasn’t his first priority since the gain would be so small. It also reduces the readability of the code. I mean, no one seems to get what you’re doing there. I don’t know exactly what it’s doing there either, I just know what you’re trying to do. Marcus probably just saw it as a premature optimization.

skappler · November 25, 2012, 3:13pm

Omg that’s the original thread where notch posted minecraft? How awesome is that

I understand what he tries to do, but not how he does it But I have to care about other things at the moment anyway.
(Nevertheless I’m interested in how it works!)

badlogicgames · November 25, 2012, 3:24pm

That access pattern causes horrible cache trashing and thus poor performance.

http://www.java-gaming.org/topics/minecraft/20505/msg/168294/view.html#msg168294 (algorithm)
http://www.java-gaming.org/topics/minecraft/20505/msg/168314/view.html#msg168314 (benchmark)

(~3x speedup purely by choosing an intelligent, cache aware memory layout within the array. needless to say i was disappointed by the lack of any feedback :emo:)
[/quote]
That’s pretty neat. I ran into a similar problem at work recently, concerning sparse vectors. One implementation used Colt’s Map, the other was a sorted array of index/float pairs. The later outperformed the former by a factor of 30, without any big optimizations on the inserts/gets. This made a real big difference in a real-world scenario where we brought down the computation of some clustering solutions from 6 hours to 12 minutes.

skappler · November 27, 2012, 6:11pm

So I tried to implement the display lists but nothing is getting displayed. Here’s the chunk class:

public class Chunk {
	
	private Vector3f coordinates;
	private Block[] blocks;
	private int displayList;
	
	public Chunk(){
		blocks = new Block[16*16*16];
		
		loadChunk();
		
		displayList = GL11.glGenLists(1);

		createDisplayList();
	}
	
	public void render(){
		
		GL11.glCallList(displayList);
		
	}
	
	public void update(){
		
	}
	
	public void loadChunk(){
		for (int i = 0; i < 16; i++) {
			for (int j = 0; j < 16; j++) {
				for (int k = 0; k < 16; k++) {
					// if((i == 15 || i == 0) || (j == 15 || j== 0))
					blocks[16*16*i+16*j+k] = new GrassBlock(i, -k, j);
				}
			}
		}
	}
	
	private void createDisplayList(){
		
		GL11.glNewList(displayList, GL11.GL_COMPILE);
			for(Block b : blocks){
				if(b != null)
					b.create();
			}
		GL11.glEndList();
		
	}

}

And here the create() Method from the Block class:

public void create() {
		float x = coordinate.x;
		float y = coordinate.y;
		float z = coordinate.z;

		float topTextureX = top / MyTextureLoader.SPRITESHEETWIDTH;
		float topTextureY = top % MyTextureLoader.SPRITESHEETWIDTH;

		float bottomTextureX = bottom / MyTextureLoader.SPRITESHEETWIDTH;
		float bottomTextureY = bottom % MyTextureLoader.SPRITESHEETWIDTH;

		float sideTextureX = side / MyTextureLoader.SPRITESHEETWIDTH;
		float sideTextureY = side % MyTextureLoader.SPRITESHEETWIDTH;

		float delta = (float) (1.0 / MyTextureLoader.SPRITESHEETWIDTH);

		// TOP
		glNormal3f(0f, 1f, 0f);
		glTexCoord2f(topTextureX, topTextureY);
		glVertex3f(x, y, z);
		glTexCoord2f(topTextureX, topTextureY + delta);
		glVertex3f(x, y, z + 1);
		glTexCoord2f(topTextureX + delta, topTextureY + delta);
		glVertex3f(x + 1, y, z + 1);
		glTexCoord2f(topTextureX + delta, topTextureY);
		glVertex3f(x + 1, y, z);

		// BOTTOM
		glNormal3f(0f, -1f, 0f);
		glTexCoord2f(bottomTextureX, bottomTextureY);
		glVertex3f(x, y - 1, z);
		glTexCoord2f(bottomTextureX, bottomTextureY + delta);
		glVertex3f(x, y - 1, z + 1);
		glTexCoord2f(bottomTextureX + delta, bottomTextureY + delta);
		glVertex3f(x + 1, y - 1, z + 1);
		glTexCoord2f(bottomTextureX + delta, bottomTextureY);
		glVertex3f(x + 1, y - 1, z);

		// BACK
		glNormal3f(0, 0, 1);
		glTexCoord2f(sideTextureX, sideTextureY);
		glVertex3f(x, y, z);
		glTexCoord2f(sideTextureX + delta, sideTextureY);
		glVertex3f(x + 1, y, z);
		glTexCoord2f(sideTextureX + delta, sideTextureY + delta);
		glVertex3f(x + 1, y - 1, z);
		glTexCoord2f(sideTextureX, sideTextureY + delta);
		glVertex3f(x, y - 1, z);

		// FRONT
		glNormal3f(0, 0, 1);
		glTexCoord2f(sideTextureX, sideTextureY);
		glVertex3f(x, y, z + 1);
		glTexCoord2f(sideTextureX + delta, sideTextureY);
		glVertex3f(x + 1, y, z + 1);
		glTexCoord2f(sideTextureX + delta, sideTextureY + delta);
		glVertex3f(x + 1, y - 1, z + 1);
		glTexCoord2f(sideTextureX, sideTextureY + delta);
		glVertex3f(x, y - 1, z + 1);

		// LEFT
		glNormal3f(-1, 0, 0);
		glTexCoord2f(sideTextureX, sideTextureY);
		glVertex3f(x, y, z);
		glTexCoord2f(sideTextureX, sideTextureY + delta);
		glVertex3f(x, y - 1, z);
		glTexCoord2f(sideTextureX + delta, sideTextureY + delta);
		glVertex3f(x, y - 1, z + 1);
		glTexCoord2f(sideTextureX + delta, sideTextureY);
		glVertex3f(x, y, z + 1);

		// RIGHT
		glNormal3f(1, 0, 0);
		glTexCoord2f(sideTextureX, sideTextureY);
		glVertex3f(x + 1, y, z);
		glTexCoord2f(sideTextureX, sideTextureY + delta);
		glVertex3f(x + 1, y - 1, z);
		glTexCoord2f(sideTextureX + delta, sideTextureY + delta);
		glVertex3f(x + 1, y - 1, z + 1);
		glTexCoord2f(sideTextureX + delta, sideTextureY);
		glVertex3f(x + 1, y, z + 1);

	}

I basically moved the rendering code to the create method and removed the glBegin and glEnd. It doesn’t make any difference where I call the OpenGl commands, right?

RobinB · November 27, 2012, 6:15pm

Something is missing here:

   
      private void createDisplayList(){
      
      GL11.glNewList(displayList, GL11.GL_COMPILE);
         for(Block b : blocks){
            if(b != null)
               b.create();
         }
      GL11.glEndList();
      
   }

Try this instead:

   
      private void createDisplayList(){
      
      GL11.glNewList(displayList, GL11.GL_COMPILE);
         GL11.glBegin(GL11.GL_QUADS);
  
         for(Block b : blocks){
            if(b != null)
               b.create();
         }

         GL11.glEnd();
      GL11.glEndList();
      
   }

skappler · November 27, 2012, 6:24pm

Oh thank you. I didn’t know this is still required
It works but the textures are weird

http://s8.postimage.org/o0wigc8v5/weirdtexture.jpg

It should be a brown block of dirt with some grass on the top. It worked in before.

theagentd · November 27, 2012, 6:41pm

Bind the correct texture? =S

skappler · November 27, 2012, 6:52pm

I did so. Before I changed the code it worked perfectly. I didn’t change anything. The texture is bound only once in the beginning.

matheus23 · November 27, 2012, 6:53pm

Probably a good question: (I haven’t worked much with DisplayLists yet…)
Bind before calling glCallList(); or while compiling the DisplayList?

My guess: while compiling.

RobinB · November 27, 2012, 6:55pm

i think you also added the new texture coords?
I guess % does not do what you expect it to do?

Riven · November 27, 2012, 6:55pm

You’ve got to change your mentality a bit

Assuming the code is broken because you screwed up is usually correct and will help you finding the solution faster. Throwing your hands up in the air like ‘it wasn’t me’ is understandable when you’re new to all this, but eventually you’ll see that your own code is at fault. (and if not, there is still nothing else to do than change your own code to work around some bug)

skappler · November 27, 2012, 6:59pm

Yes I know I tend to do this
And matheus you were right, I had to bind the Texture while compiling. Can anyone explain why?
What’s also weird is that I haven’t any blue texture like the one that was bound

@RobinB The texture coordinates remain the same, because the spritesheet didn’t change.
Also I study Computer Science so I think I know what % does

matheus23 · November 27, 2012, 7:13pm

Phew… I’ll try… tho I don’t know much about DisplayLists…

So… probably it’s more about the concept:
A Display list is really just for reducing the ammount of calls to OpenGL and being able to really pre-compile them when you compile them, instead of everytime interpreting all the commads you give to opengl.
Compiling also gives the ability to optimize it.

So to be more precise: it’s about state-compiling.
Changing the texture would change your state, which means things are not like they used to be when compiling. Having a texture pre-compiled inside the list allows some optimization, for example terminate all glBindTexture calls, which are binding the same texture. Usually opengl couldn’t do this, because it does not know, if you bind the same textures every frame. With display lists this is not possible anymore, so opengl is able to terminate those calls.

I don’t get why display lists got deprecated… One cool thing about them is for example that you are able to let a glCallList compile inside a display list, so your display list calls another display list.
Now imagine your whole game world is one single display list and once a part of your world changes, you simply only recompile the specific display list and everything works…

Riven · November 27, 2012, 7:27pm

One of the problems is that compiling display lists is really slow. So yes, in theory you could make a hierarchy of display lists, but the moment you have to update content that is a bit large (a few thousand vertices) you get a noticable hickup in your framerate.

The only modern benefit of display lists today is compiling a bunch of state changes, not feeding geometry. It can massively reduce the number of calls to the driver, and if you’re in luck the driver just passes the handle to the GPU which will apply all state changes.

Streaming & storing geometry is so fast these days, using VBOs and mapped buffers, that there is no reason to use displaylists for them.

skappler · November 27, 2012, 7:32pm

matheus23:

skappler:

And matheus you were right, I had to bind the Texture while compiling. Can anyone explain why?
What’s also weird is that I haven’t any blue texture like the one that was bound

Phew… I’ll try… tho I don’t know much about DisplayLists…

So… probably it’s more about the concept:
A Display list is really just for reducing the ammount of calls to OpenGL and being able to really pre-compile them when you compile them, instead of everytime interpreting all the commads you give to opengl.
Compiling also gives the ability to optimize it.

So to be more precise: it’s about state-compiling.
Changing the texture would change your state, which means things are not like they used to be when compiling. Having a texture pre-compiled inside the list allows some optimization, for example terminate all glBindTexture calls, which are binding the same texture. Usually opengl couldn’t do this, because it does not know, if you bind the same textures every frame. With display lists this is not possible anymore, so opengl is able to terminate those calls.

I don’t get why display lists got deprecated… One cool thing about them is for example that you are able to let a glCallList compile inside a display list, so your display list calls another display list.
Now imagine your whole game world is one single display list and once a part of your world changes, you simply only recompile the specific display list and everything works…

Ok that makes sense. Thank you very much =)

Riven:

matheus23:

I don’t get why display lists got deprecated… One cool thing about them is for example that you are able to let a glCallList compile inside a display list, so your display list calls another display list.
Now imagine your whole game world is one single display list and once a part of your world changes, you simply only recompile the specific display list and everything works…

One of the problems is that compiling display lists is really slow. So yes, in theory you could make a hierarchy of display lists, but the moment you have to update content that is a bit large (a few thousand vertices) you get a noticable hickup in your framerate.

The only modern benefit of display lists today is compiling a bunch of state changes, not feeding geometry. It can massively reduce the number of calls to the driver, and if you’re in luck the driver just passes the handle to the GPU which will apply all state changes.

Streaming geometry is so fast these days, using VBOs and mapped buffers, that there is no reason to use displaylists for them.

So the way I do it will give me laggs when I rebuild a display list? I wanted to have one List per chunk and update them when a block within a chunk changes. Is this inefficient? And would VBOs be a huge improvement?