(~3x speedup purely by choosing an intelligent, cache aware memory layout within the array. needless to say i was disappointed by the lack of any feedback :emo:)
(~3x speedup purely by choosing an intelligent, cache aware memory layout within the array. needless to say i was disappointed by the lack of any feedback :emo:)
[/quote]
Wow, I didn’t even think about that! I’m in luck though, my data blocks are all 8x8x8. =D Very interesting though, I’ll try to keep that in mind since I have code that might suffer from that.
Not a mainstream PC. An i7 860 @ 3.52GHz and a GTX 295 (both GPUs in AFR). The performance is almost the same no matter the size of the world since it’s mostly GPU limited (around 850 000 vertices for my terrain). If I make the terrain hillier there’s a lot more geometry, so performance takes a hit even though the render block sizes are reduced automatically to improve culling precision.
(~3x speedup purely by choosing an intelligent, cache aware memory layout within the array. needless to say i was disappointed by the lack of any feedback :emo:)
[/quote]
Looks good. Even though I don’t understand it yet
But I know that you can do many things more efficient if you use bit operations. I’ll take a deeper look into it and try to understand it. Thank you!
He’s generating indices in a way that improves cache coherence when reading neighbors. A 16x16x16 chunk is too large to fit into the CPU cache, so the same data is read multiple times from RAM into the cache and then immediately discarded when the next distant (for the CPU) neighbor is needed, sometimes for each block. In my code, getting x-1 would get a neighbor that 256 slots to the left in the array. That isn’t very good since there’s a big chance that that part of the array has been removed from the cache, and the CPU has to wait for it to be loaded from RAM which is a lot slower than the CPU cache. Like Riven showed in the benchmark, you get around 3x performance by using a more cache friendly way of generating indices into the array. If I got his code right, he splits up each chunk into 8x8x8 “sub-chunks” and store these in the array instead. That’s much better since the only time we might get a cache miss is when sampling over “sub-chunk” borders instead of for every single block.
But Riven, I think Marcus Persson was focusing on other things at the time. Those chunks were loaded from disk or downloaded from a server, so improving the CPU performance of generating the geometry probably wasn’t his first priority since the gain would be so small. It also reduces the readability of the code. I mean, no one seems to get what you’re doing there. I don’t know exactly what it’s doing there either, I just know what you’re trying to do. Marcus probably just saw it as a premature optimization.
Omg that’s the original thread where notch posted minecraft? How awesome is that
I understand what he tries to do, but not how he does it But I have to care about other things at the moment anyway.
(Nevertheless I’m interested in how it works!)
(~3x speedup purely by choosing an intelligent, cache aware memory layout within the array. needless to say i was disappointed by the lack of any feedback :emo:)
[/quote]
That’s pretty neat. I ran into a similar problem at work recently, concerning sparse vectors. One implementation used Colt’s Map, the other was a sorted array of index/float pairs. The later outperformed the former by a factor of 30, without any big optimizations on the inserts/gets. This made a real big difference in a real-world scenario where we brought down the computation of some clustering solutions from 6 hours to 12 minutes.
So I tried to implement the display lists but nothing is getting displayed. Here’s the chunk class:
public class Chunk {
private Vector3f coordinates;
private Block[] blocks;
private int displayList;
public Chunk(){
blocks = new Block[16*16*16];
loadChunk();
displayList = GL11.glGenLists(1);
createDisplayList();
}
public void render(){
GL11.glCallList(displayList);
}
public void update(){
}
public void loadChunk(){
for (int i = 0; i < 16; i++) {
for (int j = 0; j < 16; j++) {
for (int k = 0; k < 16; k++) {
// if((i == 15 || i == 0) || (j == 15 || j== 0))
blocks[16*16*i+16*j+k] = new GrassBlock(i, -k, j);
}
}
}
}
private void createDisplayList(){
GL11.glNewList(displayList, GL11.GL_COMPILE);
for(Block b : blocks){
if(b != null)
b.create();
}
GL11.glEndList();
}
}
And here the create() Method from the Block class:
public void create() {
float x = coordinate.x;
float y = coordinate.y;
float z = coordinate.z;
float topTextureX = top / MyTextureLoader.SPRITESHEETWIDTH;
float topTextureY = top % MyTextureLoader.SPRITESHEETWIDTH;
float bottomTextureX = bottom / MyTextureLoader.SPRITESHEETWIDTH;
float bottomTextureY = bottom % MyTextureLoader.SPRITESHEETWIDTH;
float sideTextureX = side / MyTextureLoader.SPRITESHEETWIDTH;
float sideTextureY = side % MyTextureLoader.SPRITESHEETWIDTH;
float delta = (float) (1.0 / MyTextureLoader.SPRITESHEETWIDTH);
// TOP
glNormal3f(0f, 1f, 0f);
glTexCoord2f(topTextureX, topTextureY);
glVertex3f(x, y, z);
glTexCoord2f(topTextureX, topTextureY + delta);
glVertex3f(x, y, z + 1);
glTexCoord2f(topTextureX + delta, topTextureY + delta);
glVertex3f(x + 1, y, z + 1);
glTexCoord2f(topTextureX + delta, topTextureY);
glVertex3f(x + 1, y, z);
// BOTTOM
glNormal3f(0f, -1f, 0f);
glTexCoord2f(bottomTextureX, bottomTextureY);
glVertex3f(x, y - 1, z);
glTexCoord2f(bottomTextureX, bottomTextureY + delta);
glVertex3f(x, y - 1, z + 1);
glTexCoord2f(bottomTextureX + delta, bottomTextureY + delta);
glVertex3f(x + 1, y - 1, z + 1);
glTexCoord2f(bottomTextureX + delta, bottomTextureY);
glVertex3f(x + 1, y - 1, z);
// BACK
glNormal3f(0, 0, 1);
glTexCoord2f(sideTextureX, sideTextureY);
glVertex3f(x, y, z);
glTexCoord2f(sideTextureX + delta, sideTextureY);
glVertex3f(x + 1, y, z);
glTexCoord2f(sideTextureX + delta, sideTextureY + delta);
glVertex3f(x + 1, y - 1, z);
glTexCoord2f(sideTextureX, sideTextureY + delta);
glVertex3f(x, y - 1, z);
// FRONT
glNormal3f(0, 0, 1);
glTexCoord2f(sideTextureX, sideTextureY);
glVertex3f(x, y, z + 1);
glTexCoord2f(sideTextureX + delta, sideTextureY);
glVertex3f(x + 1, y, z + 1);
glTexCoord2f(sideTextureX + delta, sideTextureY + delta);
glVertex3f(x + 1, y - 1, z + 1);
glTexCoord2f(sideTextureX, sideTextureY + delta);
glVertex3f(x, y - 1, z + 1);
// LEFT
glNormal3f(-1, 0, 0);
glTexCoord2f(sideTextureX, sideTextureY);
glVertex3f(x, y, z);
glTexCoord2f(sideTextureX, sideTextureY + delta);
glVertex3f(x, y - 1, z);
glTexCoord2f(sideTextureX + delta, sideTextureY + delta);
glVertex3f(x, y - 1, z + 1);
glTexCoord2f(sideTextureX + delta, sideTextureY);
glVertex3f(x, y, z + 1);
// RIGHT
glNormal3f(1, 0, 0);
glTexCoord2f(sideTextureX, sideTextureY);
glVertex3f(x + 1, y, z);
glTexCoord2f(sideTextureX, sideTextureY + delta);
glVertex3f(x + 1, y - 1, z);
glTexCoord2f(sideTextureX + delta, sideTextureY + delta);
glVertex3f(x + 1, y - 1, z + 1);
glTexCoord2f(sideTextureX + delta, sideTextureY);
glVertex3f(x + 1, y, z + 1);
}
I basically moved the rendering code to the create method and removed the glBegin and glEnd. It doesn’t make any difference where I call the OpenGl commands, right?
Assuming the code is broken because you screwed up is usually correct and will help you finding the solution faster. Throwing your hands up in the air like ‘it wasn’t me’ is understandable when you’re new to all this, but eventually you’ll see that your own code is at fault. (and if not, there is still nothing else to do than change your own code to work around some bug)
Yes I know I tend to do this
And matheus you were right, I had to bind the Texture while compiling. Can anyone explain why?
What’s also weird is that I haven’t any blue texture like the one that was bound
@RobinB The texture coordinates remain the same, because the spritesheet didn’t change.
Also I study Computer Science so I think I know what % does
Phew… I’ll try… tho I don’t know much about DisplayLists…
So… probably it’s more about the concept:
A Display list is really just for reducing the ammount of calls to OpenGL and being able to really pre-compile them when you compile them, instead of everytime interpreting all the commads you give to opengl.
Compiling also gives the ability to optimize it.
So to be more precise: it’s about state-compiling.
Changing the texture would change your state, which means things are not like they used to be when compiling. Having a texture pre-compiled inside the list allows some optimization, for example terminate all glBindTexture calls, which are binding the same texture. Usually opengl couldn’t do this, because it does not know, if you bind the same textures every frame. With display lists this is not possible anymore, so opengl is able to terminate those calls.
I don’t get why display lists got deprecated… One cool thing about them is for example that you are able to let a glCallList compile inside a display list, so your display list calls another display list.
Now imagine your whole game world is one single display list and once a part of your world changes, you simply only recompile the specific display list and everything works…
One of the problems is that compiling display lists is really slow. So yes, in theory you could make a hierarchy of display lists, but the moment you have to update content that is a bit large (a few thousand vertices) you get a noticable hickup in your framerate.
The only modern benefit of display lists today is compiling a bunch of state changes, not feeding geometry. It can massively reduce the number of calls to the driver, and if you’re in luck the driver just passes the handle to the GPU which will apply all state changes.
Streaming & storing geometry is so fast these days, using VBOs and mapped buffers, that there is no reason to use displaylists for them.
So the way I do it will give me laggs when I rebuild a display list? I wanted to have one List per chunk and update them when a block within a chunk changes. Is this inefficient? And would VBOs be a huge improvement?