[LWJGL] [JOML] Some instancing queries

I am using lwjgl with joml to make a voxel engine (mainly for infinity and destruction). I have experience in opengl programming in java using lwjgl but am new in the instancing

  1. There is a SimplexNoise class in JOML. How can I use it to get transform values for the cubes?

  2. After getting the transform values, they must be stored somewhere. According to me it should be an array of model matrices. But Matrix4f[] returns a lot of errors and MatrixStackf does not provide Matrix array calculations.

  3. Now, how will I get my matrices to a buffer for shader use?

  4. I need a simplified VAO/VBO initialization/binding/attrib code for the instanced arrays.

Thanks in advance :slight_smile:

You do not use noise to generate the transformation (i.e. “position”) of a cube, but you use the noise function as a density function describing at every lattice point of your world whether or not that position contains a cube (and probably what kind of cube).

You do not need matrices at all. They’re overkill for what you are doing. Just assign each visible cube a 3-component vector denoting its position.

Like above, you do not need matrices. But if you wanted to, did you consider reading JOML’s README.md?

Google for “OpenGL instancing” and you’ll find very good explanations and examples.

Thanks, I made a Vector3f array and stored the transform values in it. I also made a UBO to transfer it to the shader. Now, how can I multiply the vec3 in such a way that it processes as if I made a model matrix for each instance ?

Normal vertex shader code

#version 330 core;

layout(location = 0) in vec3 position;
layout (location = 1) in vec3 color;
layout(location = 2) in mat4 model;

uniform mat4 view;
uniform mat4 perspective;

out vec3 excolor;

void main(){
gl_Position = perspective * view * model * vec4(position, 1.0);
excolor = color;

How should I change the algorithm now there is gl_InstanceId and a vec3 instead of mat4?

Think about it for a second. You have your vertices whose ‘position’ values are (hopefully) all around (0, 0, 0), like with a unit cube.
And then you have your ‘model’ positions which define where in the world that particular cube is located at. That means, the ‘model’ position (x’, y’, z’) should translate a cube (which was previously around (0, 0, 0)) to have its center at (x’, y’, z’). Now, you should be able to figure out what to do so that this happens.

By the way, if you manually multiply your vertex with a model matrix that you’ve built with Matrix4f.translate(), then you’ll find that you end up with exactly the same calculation that you would do above with a single ‘model’ position instead of a matrix.

Other great resources which I strongly recommend you look at:

Ah, I should have known!. Anyway I got my work done, and also found out that gl_InstanceId is bad practice. So I use the gl4.2 method.

Hm? Who says that using gl_InstanceId is bad practice?? I’m using it and it works perfectly fine. What should the gl4.2 method be!?

According to learnopengl.com instancing tutorial https://learnopengl.com/#!Advanced-OpenGL/Instancing(and many other websites), the gl_Position with uniform vecs is fine for a few instances, but if we hit a lot more than 100 instances, it may make a performance issue. The new method is that an instance offsets VBO is created together with glVertexAttribDivisor(2, 1); . We now let opengl handle the shader iteration.

Hmmm, another problem. While setting up the VBO code of offsets, floatbuffer does not accept an array of vector3fs . How do I get past this? without passing the vec3s instancing arrays would not execute.

1. v.get(floatBuffer)
2. vs[i].get(3 * i, floatBuffer)
3. floatBuffer.put(v.x).put(v.y).put(v.z)

where ‘v’ has type Vector3f, ‘vs’ has type Vector3f[], and floatBuffer has type FloatBuffer

Make sure you carefully read the JavaDocs of all methods involved, especially that of case 1.

Thanks. Now I use this method :-

//BEFORE: init Vector3f[16] chunk_transforms etc. 

FloatBuffer fb = BufferUtils.createFloatBuffer(sizeOf(Vector3f) * CHUNK_SIZE);
for(int c = 0; c =< CHUNK_SIZE; c++){ 

//After : VBO/VAO code etc.

EDIT: Code typo

Sorry, I’m not quite sure if I understand you. Okay, maybe if you only deal with a vec3 and nothing more, it might be better performancewhise to use instanced attributes… but “in general”, I would say that gl_InstanceId is not deprecated but even the “cleaner” way. Because first, you have more than a vec3 position for your entity. Modern way of handling this is to have large uniform buffer or ssbo. In most cases, it would be at least a 4x4 matrix with a transformation. In my case, I have 24 values per entity - Having this as instanced attribute would mess up the strucutre pretty much. And the more important thing: What does my vertex structure have to do with how my entity is instanced?? My rendering pipeline lets me adjust instance count on the fly…all I have to do is change a single value in my command buffer and I have another instance. Okay, plus a change in my entity uniform buffer, where I add another entry and maybe have to rebuffer. But since there are stored transformations, the data is probably dynamically updated so or else, while my vertex data is completely static. Another advantage: You can have the vertices stored once and simple push an isntanced render call with indexing into your uniform array… I’m pushing 4 mio vertices with 340000 cubes (instances) with this pipeline on my GTX 1060 at 60 fps, so I doubt that there really is a performance penalty at all - maybe on older hardware…

I can use your setup for characters and NPCs. I don’t think I should use this for land as :-

  1. My voxels are VERY small compared to Minecraft/Cubeworld/Other projects here. It is about this size:-

  2. Since there are very small voxels, there would be a lot of instances as compared to the standard textured voxel engine ( The chunk size is being debated on). And according to KaiHH :- [quote]You do not need matrices at all. They’re overkill for what you are doing. Just assign each visible cube a 3-component vector denoting its position.
    After all I only need positions for the land cubes.


I now have updated my code to support flags for solidity in the blocks. So, now I am storing 5 vector3f components in 1 block(for the faces) and inserting an array of blocks to the Renderer class. I guess it will be instances of quads now. So now how do I ‘rotate’ the quads to form pseudo cubes? I wouldn’t recommend myself JOML for this as this would be too much pressure for the CPU. But I think the GPU is built for performance so I might use matrices “in-shader”. Algorithms, anyone?

My own little voxel project, though I’ve not touched it for a long time, renders blocks by constructing a single mesh for the entire chunk (32 blocks per axis), using a greedy-meshing algorithm, combining any similar blocks next to each other into larger quads with texture coordinates to match. For example, an even plane of stone blocks is rendered as 6 quads, and looks just fine.

My voxels need several things. Texture, vertex color, and maybe normal mapping if I ever get around to that. There are pre-set values for baking the most rudimentary of lighting onto them. For example, if a block’s top face has the color of (1, 1, 1, 1), then the “north” face will be (0.7, 0.7, 0.7, 0.7) to help see the block shapes. That’s probably not going to stay forever, but it helps enormously for now. For now, animated textures are outside the realm of what I need, except for on special blocks, which are their own case. I also haven’t yet done any lighting or shadows except for the basic stuff described above.

I’ve seen talk of using 6 VBO’s, one for each set of faces, IE a “north-face” vbo, etc, and you’d only ever have to draw three of them at once (How are you going to see the top and bottom of a block all at once? That’d be weird). That might be worthwhile, but I’ve yet to implement such a thing, though the greedy meshing I use is a prime candidate since each set of faces is computed in a separate sweep anyway.

Though, since having to regenerate them each frame when checking in what direction the player is looking sounds bad, you’d probably need to generate all 6 VBO’s in advance, and select which ones to render. Whether or not the performance gained from not drawing the entire chunk geometry in one draw call, vs the overhead from drawing 3 VBO’s, is a question I’ve no answer to, though I’m sure there are very smart folks around who could tell you. I’ll probably try and profile each method myself to figure that out, it’s rather intriguing.

It’s even murkier if you’re doing transparency stuff too. I don’t know how it’s done the “recommended” way; I just generate another set of geometry for the transparent stuff, since I’ve not the foggiest idea how to do transparent and opaque blocks in the same set of geometry. It’s probably possible, but not by such a plebian person as myself. My first guess is geometry shading plus some sort of binary value, but I really don’t know.

Using instancing to render the blocks themselves, from what I’m getting here, seems a little odd. Not wrong, never wrong, just not a use-case that I’ve ever considered.

For me, I store 1 short per normal terrain block in a big 1D array, which I access using a hashing function. That means there can be a little over 30 thousand different block types in the world, more if I used unsigned shorts. I think that’d be like 65536 different types? Plenty, even if people ever picked up modding for it. Whether or not that is actually more space efficient, I don’t know; I remember reading that Java uses 32 bits for each value anyway. Plus, the whole having to cast to shorts probably doesn’t help, though there’s probably a more effective method lurking on the internet.

32 x 32 x 32 shorts for tile IDs, 3 integers for the chunk index, and a pointer to the parent “grid” make up a Chunk. If / when I get around to it, each Chunk may contain a list of their own self-contained, so-called “tile entities” I.E. a furnace, and if that chunk ever gets unloaded while working, when it’s reloaded, you can calculate the work it was supposed to have done while away. So, at the very least, a Chunk takes up 66~ ish kilobytes.

That ain’t so bad! Though if I ever have rotate-able blocks, I’ll need to start storing rotation data for them, which I’ll endeavor to have no more than double the size of the chunk. Should be well within the realm of possibility, and 132~ kb / chunk is… well, it isn’t amazing, but it’ll work faster than if I were to pack everything as close together as possible. Hashing and bit manipulation probably costs more than it’s worth at that point.

A Grid contains a unique identifier name, and a HashMap<Vector3i, Chunk> map of chunks. It’s easy to figure out chunks, since I’d just divide and round down any in-world position by 32, and that would give me chunk indices. So if my player is at (20, 20, 20), he’s in chunk (0, 0, 0) and, in-chunk, he’s at (20, 20, 20). It even works for negative indices if you fiddle with the hashing function, something that I thought was taken for granted but apparently is not; if he were at (-1, -1, -1) then a lot of the time the calculation would put him in (0, 0, 0) when he should actually be at chunk (-1, -1, -1), in-chunk (31, 31, 31). Rounding has a lot to do with it, as well as integer division. I’ve not yet figured out the correct hashing function so it always works, but I’m getting there.

Loading and unloading chunks to-disk is easy as checking their distance from where the player is; if the chunk’s too far away, unload it. However, if there are any special blocks that are doing work, save their current progress so you can calculate what needs to happen the next time that chunk is loaded. It might also be an idea to unload chunks that haven’t had any change happen to them for a certain amount of time, but keep the mesh; upon the player re-entering, or upon the modification of such a chunk, record the changes, reload it, apply the changes, and continue as usual.

For example, a giant bomb going off far away would need to make changes, so you could design a slower-paced solver for that bomb blast, have it trudge along so it doesn’t slow down the actual action zone, and when it’s done, unload everything. Or, if the player gets close enough that the job needs to be finished, switch over to a faster solver. Perhaps modify the speed based on how far away the player is.

I digress. And it’ll happen again, probably. ANYWAYS!

The worst-case scenario for meshing means I’m generating 6 quads per block, I.E. a 3D checkerboard of glass and stone where there are the maximum number of visible faces possible, since there’d be two worst-case scenarios, one for opaque blocks, and another for transparent. That’s fine, I’m pretty sure something like that is literally unavoidable for such a situation, and is incredibly unlikely to occur in normal play.

Special blocks, like a furnace or something, would probably be handled with instancing, as they’re a higher-poly model, and I can supply the shader with a series of positions and rotations, probably condensed down into as little information as possible. Or a transformation matrix, that could work. If this is the sort of thing you’re trying to figure out, then:

My matrix math is rough at best, and I really need to work on that. However, I’m sure you can find that information on google somewhere, on how to create a transformation matrix. Keeping a list of “special” blocks and their positions and orientations, and sending that data to be instanced, should be very possible. Each matrix would need to be constructed with one position and one orientation, yeah?

For position, a couple floats would probably be fine, X Y and Z. For rotation, 2 bits per axis, 0 being 0 degrees, with 1 being 90, 2 being 180, and 3 being 270, would probably work, and you can supply that in the form of a byte with 2 dummy bits at the end you can ignore. YMMV on that one, I’ve not done a ton with memory packing like this, though it seems promising or at least worth investigating. Maybe using another float would be a good idea.

Constructing the matrix in the shader might be worthwhile. From what I’ve read about recent technology, the GPU has a ridiculous amount of horsepower at its disposal that the CPU does not, and that may be one of those calculation-heavy tasks that the CPU would struggle with that the GPU would be fine with. I’d recommend testing both methods; in-shader and out.

The big thing I always seem to consider is, “Is it worth the processing to figure out exactly which set of voxel faces what needs rendering?” Each face is 2 quads, it can’t hurt that much to be a little more efficient with CPU to have a few extra triangles for the GPU.

Bear in mind, I’ve not touched the thing for a year and a half. The trends and methodology for rendering voxels might have evolved since then.

However, for some specs, I can render a space of 8x8x8 chunks, which is… 16,777,216 blocks in volume, 256 blocks per edge, and it does rather well. I’ve not even gotten into culling the chunks that are underground, that you wouldn’t even be able to see anyway, so the overhead from their draw calls and 6 quads are included in that too.

Further optimizations may include the whole 6 vbo deal, and when tile entities come around, raycasting or frustum culling to make sure that they’re visible will eliminate the ones that aren’t. You could even frustum cull chunks too.

Another way it just occurred to me, for the 6 vbo deal, is to only have 1 VBO and to put all the data you need into it, and just render a subset of that data for each face that’s visible. Might be worthwhile.

I get rather excited about these things and waffle on a great deal. I’m no authority on the subject, but there’s some stuff I’ve been reading that might help:

That’s what I’m using to help build from the ground up, since I wanted my own engine! Maybe a bit ambitious, but hey, it’s fun to learn. 8)

A very interesting blog that has a bit of a disjointed deal to its progress, but still fun and engaging to read!

Plus various articles I found by googling everything under the sun about voxels.

Hope this was at the very least informative, if not actually helpful! ;D