How to improve LibGDX 3D rendering performance?

Hello everyone, I’ve been running into some trouble recently trying to improve the framerate I get rendering my scene. If anyone could help out, that would be fantastic. I’ve also got this plea for help posted over on StackOverflow http://stackoverflow.com/questions/31361673/how-do-i-improve-libgdx-3d-rendering-performance if anyone would rather post there. Here’s the question:

I’m working on rendering a tiled sphere with LibGDX, aimed at producing a game for desktop. Here are some images of what I’ve got so far: http://imgur.com/GoYvEYZ,xf52D6I#0. I’m rendering 10,000 or so ModelInstances, all of which are generated from code using their own ModelBuilders. They each contain 3 or 4 trianglular parts, and every ModelInstance corresponds to its own Model. Here’s the exact rendering code I’m using to do so:


    modelBatch.begin(cam);
    // Render all visible tiles
    visibleCount = 0;
    for (Tile t : tiles) {
        if (isVisible(cam, t)) {
            // t.rendered is a ModelInstance produced earlier by code.
            // the Model corresponding to the instance is unique to this tile.
            modelBatch.render(t.rendered, environment);
            visibleCount++;
        }
    }
    modelBatch.end();

The ModelInstances are not produced from code each frame, just drawn. I only update them when I need to. The “isVisible” check is just some very simple frustum culling, which I followed from this tutorial https://xoppa.github.io/blog/3d-frustum-culling-with-libgdx/. As you can tell from my diagnostic information, my FPS is terrible. I’m aiming for at least 60 FPS rendering what I hope is a fairly-simple scene of tons of polygons. I just know I’m doing this in a very inefficient way.

I’ve done some research on how people might typically solve this issue, but am stuck trying to apply the solutions to my project. For example, dividing the scene into chunks is recommended, but I don’t know how I could make use of that when the player is able to rotate the sphere and view all sides. I read about occlusion culling, so that I might only render ModelInstances on the side of the sphere facing the camera, but am at a loss as to how to implement that in LibGDX.

Additionally, how bad is it that every ModelInstance uses its own Model? Would speed be improved if only one shared Model object was used? If anyone could point me to more resources or give me any good recommendations on how I can improve the performance here, I’d be thankful.

I’m not sure how the LibGDX classes you’re using work exactly, but if rendering a ModelInstance breaks batching (and it seems like it may, if each instance has its own transform), then it seems you would have 1000s of draw calls (each with a small amount of geometry) and individual frustum culling tests per frame. It does seem like that would likely lead to performance problems.

It might be helpful to know a little more about the context. Can you describe how the models are created? Is each model just a single hexagonal tile? Does each tile have a different transform, or is the model geometry created in world space with each model having an identity transform? You also mentioned that the models are only updated when needed. Under what circumstances do they need to be updated, and what changes are made to them?

That and any other relevant info you can think of might help clarify things a little.

First thing with optimization, is make sure you are optimizing the right thing. Have you run the game through a profiler? If you haven’t it is very simple and most IDE’s will have a nice big “profile” button that will do everything for you and give you the results in a nice easy to read format. So if you haven’t, make sure you do and check that it is your rendering methods that are taking the most time (if there is any sanity to LibGDX it’ll be spending most time in the batcher’s render() and end() methods). Anyway…

What immediately jumps to mind is to use deferred rendering directly rather than batching every frame. This would mean creating one or more OpenGL buffers at initialization time containing all the data and only when there is a change updating / recreating them. This would make the drawing much quicker (at the cost of a little video memory of course). But I don’t use LibGDX and have no idea how to go about doing this in the LibGDX framework.

Occlusion culling would be very good in this case and very efficient too for the case of a simple sphere however, in your picture you have the tiles on the far side of the sphere being rendered and you can see them. With occlusion culling this would not be the case so you’d have to decide whether or not you want this.

If you do want to cull them then the test is as I say very simple, a single dot product between the tile’s normal (a vector pointing from the centre of the circle to the centre of the rotated tile) and the camera’s forward vector (as long as you aren’t moving the camera, this is just [0, -1, 0] or potentially [0, 1, 0] depending on how your matrices are set up…). If the result is greater than or equal to zero then the tile is on the far side of the sphere and can be culled.

Yeah, that sounds exactly like what I’m doing right now. Definitely a bad idea.

Sure thing. To begin, I start with the coordinates of an icosahedron. Then I have a method which subdivides this, storing all of the Vector3 vertices in a list. I do this a few times, and then I draw the sphere approximation by connecting the centroids of the triangles surrounding the various vertices. I group these Vector3’s into “Tile” objects. These are my representation of the hexagons and 12 pentagons that make up the sphere. Then, once all of these points are appropriately grouped, I call a “draw” method which uses a ModelBuilder to create a Model for every tile. This model is actually 3 to 4 triangles that fill in the tile. Then a ModelInstance for every one of these Models (one instance for each tile) is collected into a list. This list is then rendered every frame. Sounds like this is still 3-4 * 10,000 draw calls per frame then.

Most models are just single hexagonal tiles. Twelve of them are pentagons. The model geometry is created in world space from the coordinates of the vertices of each polygon, and no translation is applied. I did this because I had an awful difficult time trying to just move pre-made hexagon .objs into proper position and rotation. I did some testing, of what would happen if every single ModelInstance was from the same Model. I got an extra 30 FPS almost immediately, so I think I’m definitely going to need to try going this route. There’s still other optimizations I could make, I think.

Sure. They’re pretty much never updated. I render this sphere approximation at each step in subdividing the icosahedron, but that runs on key-press just for testing. In an actual game, all of that setup time will probably happen in my loading screen. One thing I’d like to be able to do is at least change the color of the tiles during runtime. Currently I do this by going to my list of ModelInstances, finding the ModelInstance corresponding to a tile, and changing that one particular instance.

I did try running a built-in profiler with Eclipse, but it didn’t really give me too much helpful peek into the rendering process. Like I wrote above, I did some testing with sharing the Model among all instances and that seems to be helpful. I definitely think the batcher’s rendering is what I should be trying to optimize.

I have no idea how to do that with LibGDX either, but I can try poking around. This is my first 3D project, so I don’t really know how to do much of anything yet. So this would allow me to store the results of one batch across frames? That would be amazing, because tiles don’t ever change significantly. Maybe just a few tiles per in-game turn, which would amount to a mere handful of tiles that need to change every hundred or so frames.

Oh, the tiles being rendered in the net-style rendering won’t be there. I aim for the other image with the filled-in tiles, so you won’t be able to see the other side of the sphere. I definitely want to give culling a shot.

Thanks for this, I’ll definitely give it a shot. Would it still work if the camera did move, though? The camera is always going to be fixed looking at the center of the sphere, but I want the player to be able to rotate around the sphere.

Using one or a few models and giving each model instance an appropriate (non-identity) transform may offer some improvements (as you note), but may still be problematic. For one thing, you would still be making 1000s of draw calls, each with small amounts of geometry. Also, the tiles would be unlikely to connect seamlessly at the edges.

Say for the sake of argument that you dropped all the per-tile stuff and just made one big static mesh (which would allow you to render the entire sphere with one draw call). I’m guessing this by itself would improve your performance significantly.

If there were still performance issues at that point, or if having a single mesh proved to be problematic for other reasons, you could start to refine your approach. One possibility would be to use chunks, as you mentioned earlier. A possible approach would be for each chunk to correspond to a face of the original icosahedron.

There may be more to it of course. I don’t know if you’re going to be using textures, but if so you’ll likely need to use an atlas of some sort. And so on. But, I think moving towards one or several large meshes would be the first step regardless.

Just got back from implementing this one quick: it works great! It was easy to get the forward vector of the camera as it rotated, and the occlusion culling is seamless and gained me about 10 FPS. I’m really interested in hearing more about deferred rendering.

Right, the seamless connecting is why I went with this approach in the first place.

It does, just not as much as I’d hoped. I tested it earlier, by creating just one ModelInstance of one fairly-complicated piece of geometry. The problem came from using the LibGDX ModelBuilder–every tile is rendered into the one ModelInstance, but with each 3/4 triangles as a separate part still. I think that results in the same number of draw calls. I could try modeling the entire shape as a static mesh in Blender, and then just using that. But then I’m left wondering how to go about texturing individual pieces of one large mesh, which is a problem I have no idea how to solve. I also lose the ability to procedurally-generate this shape. Although honestly I could just model the first 10 subdivisions and swap them out as need be…

I could look into grouping the tiles into larger meshes. I’ve already written some code to partition the sphere into twelve larger pieces centered around each of the pentagons. Although again, it doesn’t seem to help performance much as I’m still generating a Model for every tile from code, which is resulting in the thousands of draw calls. Several large meshes would work if they were shared. I’ll have to play around with this more shortly, thanks for your input so far! Any ideas on solving that possible “texturing one giant mesh” issue might be an interesting approach here.

I could be wrong, but I think quew8 may just be suggesting the same thing I’m suggesting here, that is, using one or a few large meshes. (The use of the term ‘deferred rendering’ may be a little confusing here, as that term refers to a specific rendering technique that’s unrelated to what’s being discussed here.)

Although I don’t know what LibGDX is doing under the hood, it seems plausible that using ModelBuilder is somehow negating the advantages of using one or a few large meshes, as you suggest. That said, there shouldn’t be anything to prevent you from creating the mesh(es) procedurally (at least not on the OpenGL side - not sure about LibGDX). You shouldn’t have to use an external modeling program or anything.

Regarding texturing, how many different tile types do you have in mind? And will they be able to change dynamically, or will they be static? (Regardless, solving the texturing problem should be relatively straightforward.)

I found this StackOverflow question here http://stackoverflow.com/questions/21161456/building-a-box-with-texture-on-one-side-in-libgdx-performance which points out what I thought I’d discovered–every time I construct a triangle to make up these Models, that’s a draw call. Constructing the large meshes from world-coordinates would require me to use the same number of draw calls, no?

For the game I’m making I can get away with just three or four textures, but tiles will need to dynamically change texture.

EDIT: I just followed some of the discussion in that SO question, and it clicked for me. By only creating one MeshPartBuilder, I could limit each tile to one draw-call. Now I have about 5000 draw calls per frame instead of four times that. I can keep 60 FPS now, and I’m going to look at the other ways to improve this. Thanks for all of the help so far.

[quote]Constructing the large meshes from world-coordinates would require me to use the same number of draw calls, no?
[/quote]
No, it would require far fewer draw calls. In fact, you could most likely render the entire sphere with a single draw call, or a handful at most.

It sounds like you’re making progress performance-wise, so maybe further changes won’t be needed. But, if you do end up wanting to optimize further, I think there are probably other options you could explore here.

Right assumed you’d be rotating the sphere and keeping the camera constant but now I think about it your way round is better on so many levels. And I didn’t actually realize there was a second picture, turns out I don’t know how Imgur works.

As for deferred rendering, had a quick look through the LibGDX api (it was very sane) and it seems to me that the Mesh class is the one you want to be using to render this without batching up every frame. http://libgdx.badlogicgames.com/nightlies/docs/api/com/badlogic/gdx/graphics/Mesh.html. I might be wrong of course. Making the mesh, I assume that is fairly consistent as with the batching API so shouldn’t pose too much of a problem. And rendering, I think the render() method of Mesh is all you need.

Now the problem with doing this. If you have a different Mesh for each tile it will not be terribly efficient (I’d guess round about your current level). Certainly try it out because it might be fine like that and would be a lot easier, but if it doesn’t then you’ll have to group tiles together into the same Mesh. You mentioned you have already split the sphere up into 12 (?) sections so if you were to have a single Mesh for each section that would be very good. But that brings another problem with the occlusion testing. (what follows might be very patronizing for you considering how easily you implemented it last time. If so I apologize).

Essentially you can’t just use the centre of the section because even if that is facing away, some tiles might not be (since the section is not flat like the tiles). The naive way to solve this would be to test each tile around the edge of the section and if any of them are facing toward the camera then render the whole section. But there is another way (performance gain will be negligible so I’d do whichever is simpler). I assume that each section will cover a certain sector (probably wrong term) of the sphere, if so then the test dot “can you cull” can be “is the dot product positive AND is the angle between the camera’s forward vector and the section’s average normal (direction from centre to centre of section) greater than 90 degrees minus half the angle covered by the section.” So a bit of psuedo code because writing that confused me:


float angleCovered = ...;
Vector3 cameraForward = ...;
Vector3 sectionNormal = ...;
float dot = dot(cameraForward, sectionNormal);
//less than or equal to now, again since the section is no longer flat.
if(dot <= 0) { 
    thenRenderSection();
} else {
    //This line is unnecessary if you normalize the two normals.
    float normalDot = dot / (cameraForward .magnitude() * sectionNormal .magnitude());
    float angle = acos(normalDot);
    if(angle > (PI / 2) - (angleCovered  / 2)) {
        thenRenderSection();
    }
}

And there is an optimization you can do there by precomputing the cos of “(PI / 2) - (angleCovered / 2)” and then you don’t need to use arccos.

I hope that made some sense. Feel free to question / criticise.

Thank you both so much for helping me figure this one out. I split the sphere into different regularly-shaped regions based on the level of subdivision, and then render these combined chunks. The performance gains are incredible–I’m hitting 120 FPS at a level of subdivision one higher than my original post, and I haven’t even implemented any sort of culling yet.

Hey, thanks for the math to get it figured out. :slight_smile:

Oh, don’t worry about sounding patronizing–you’re being extremely helpful, this is my first real 3D project, and it’s always better to have a clear explanation. I don’t have time to implement culling the chunks right now, but thanks so much for showing me the math to do so! I’ll be sure to check back once it’s implemented too.