libgdx + textureatlas too many render calls

ff7squirrelman · May 19, 2018, 3:11am

I was under the impression that images packed together using texturepacker and references using a texture atlas would count as one image (assuming your texturepacker fit into one file) but that doesn’t seem to be the case, am I mixing up how a texturepacker works?

my previous textureatlas setup was running around 40-80 draw calls per loop, which seems way too high based on the numbers I saw people quoting as reasonable. In a attempt to reduce it further I decided to repack my textures to optimize the texture packer better. so I spent the last 2 days painstakingly breaking up my sprite sheets into individual frames so that I could pack them more efficienctly (before I had a mix of sprite sheets and individual images that I put through the texturepacker). so it’s 2 days later and I finally finished it, the texture atlas is actually the same size but now I am getting insane draw calls. on average around 100-200. but I put code in my loop to count the highest draw call of all loops and the worst one was like 3400 calls in 1 loop. is it better to make spritesheets then put the spritesheets through the packer? again I was under the impression that once they were packed it would count as 1 texture anyway (or whatever number of pages it ended up packing into).

Unless I was wrong about how the packer works my only other idea for such a huge change after just repacking it is that now that the images aren’t packed as spritesheets the images they contained could be on divided between the first and last page of the packer so could be causing more texturebinds with my animations? But then even with a 1 page textureatlas I am getting on average 40-100+ calls per loop and a max of about 590.

also am I better off making multiple texture atlas’s? one for frequently used images, and another for ones that are rarely used? (assuming I can’t fit them all onto one page, currently they are 3 pages unless I make them 4096x4096 which texturepacker advises against).

VaTTeRGeR · May 19, 2018, 9:13am

[quote]40-100+ calls per loop
[/quote]
And how does that loop look like?

Your problem is a bit hard to analyze if we don’t know what you’re doing.

Show us as code snippets how you:

Retrieve the individual sprites from the atlas for drawing.
How you draw them.
What your rendering loop looks like.

And give some information about how many sprites are on screen for the given amount of draw calls.

Are you doing something different from the default way shown in the LibGDX docs?

65K · May 19, 2018, 10:26am

Do you have a performance problem ?

ff7squirrelman · May 19, 2018, 5:51pm

-VaTTeRGeR
-I was actually just asking about whether or not I understood texturepacker and textureatlas right because of the huge draw call change just from repacking the textures. It didn’t make any sense if the textures are all considered one for texture binding. I thought maybe I was wrong and that splitting up my spritesheets and then repacking them made it worse than when I left them together and split them up using split in my constructor. But if you want to help analyze the other problems i’d be glad to post some info I just was hoping to narrow it down a bit first since i’m a bit embarrassed by my code and its a bit massive. I’ll try to post some snippets at the bottom of this post though.

-65K
on desktop no, but on my android device (moto g3) it runs at 20 fps and then drops to like 10-14 as more enemies spawn. Maybe my phone is just older than I think it is but I use it as the standard for a lower end device I think my apps should be able to run on. It seemed to work ok (though I didn’t check the actual fps) on my brothers moto m (no idea how strong of a phone it is, but its a few years newer than mine at least). Actually while talking about performance I also have a issue where my app takes 130ish MB of memory (checked using android profiler), this is obviously because I keep everything in memory all the time. But 90% of my pngs are used constantly anyway. For the ones I can remove would they have to be in a seperate texture atlas? I imagine you can’t dispose parts of a texture atlas, its either all or nothing? Last I checked anything over idk like 40MB is too much for android apps so thats another issue i’ll probably need to figure out later (unless I am reading outdated info).

#of enemies and other draws on the screen = (it’s one of the shorter videos I have, sometimes theres almost twice as many enemies if you aren’t killing them fast enough) https://www.youtube.com/watch?v=1ELx2tR_N54

-I should mention I have a normal spritebatch and a interface spritebatch, I render all the normal batch first then the interface are all grouped at the end (as much as possibly anyway)
-bitmap text is not in the textureatlas, it’s stored as a individual png (I might try to change this soon though since text is rendered pretty frequently)
-some objects such as the fire pits use particle emitters, these aren’t stored in the texture atlas either (i’m pretty sure they can’t be)
-the lightning is code generated, idk if the constantly changing alpha value trigger’s more renders? it’s from a tutorial on code generated lightning, I was going to leave it out for visibility but I ran some tests and it seems to have a huge impact on the render calls so i’ll include it at the bottom (its still really bad without it)
-max sprites in batch is only ever like 700, so still far below the 1000 default cap

-the level itself is a combination of framebuffered tiles (the ones that don’t change) and the rest is rendered similiar to the skeleton i’ll post below

-retrieval

atlas = new TextureAtlas("texturepacked.atlas");
skeletonFallFrame = atlas.findRegion("skeletonfall");
skeletonWalkAnimation = new Animation<TextureRegion>(0.1f, atlas.findRegions("skeletonwalk"));
skeletonAttackAnimation = new Animation<TextureRegion>(0.2f, atlas.findRegions("skeletonattack"));
skeletonFallFrameFrozen = atlas.findRegion("skeletonfallfrozen");
skeletonWalkAnimationFrozen = new Animation<TextureRegion>(0.1f, atlas.findRegions("skeletonwalkfrozen"));
skeletonAttackAnimationFrozen = new Animation<TextureRegion>(0.2f, atlas.findRegions("skeletonattackfrozen"));

-rendering (it only renders whats in view, I read that libgdx doesn’t really do this automatically but even if it does, this saves other code from running anyway)


public void render(SpriteBatch batch, float minX, float maxX, float minY, float maxY)
{
    if(minX <= (x + width) && x <= maxX && minY <= (y + height) && y <= maxY)
    {
        if (frozen == true)
        {
            if(animationType == ActionResolver.AnimationType.moving)
            {
                currentFrame = myWorld.gameArt.skeletonWalkAnimationFrozen.getKeyFrame(frameTimer, true);
            }
            else if(animationType == ActionResolver.AnimationType.falling)
            {
                currentFrame = myWorld.gameArt.skeletonFallFrameFrozen;
            }
            else if(animationType == ActionResolver.AnimationType.attacking)
            {
                currentFrame = myWorld.gameArt.skeletonAttackAnimationFrozen.getKeyFrame(frameTimer, true);
            }
        }
        else
        {
            if(animationType == ActionResolver.AnimationType.moving)
            {
                currentFrame = myWorld.gameArt.skeletonWalkAnimation.getKeyFrame(frameTimer, true);
            }
            else if(animationType == ActionResolver.AnimationType.falling)
            {
                currentFrame = myWorld.gameArt.skeletonFallFrame;
            }
            else if(animationType == ActionResolver.AnimationType.attacking)
            {
                currentFrame = myWorld.gameArt.skeletonAttackAnimation.getKeyFrame(frameTimer, true);
            }
        }

        if(destroyed == false)
        {
            batch.draw(currentFrame, x, y, width, height);
            lifeBar.render(batch);
        }
    }
}

-render loop (this is fairly huge so i’ll try to take as little as I can without screwing up the snippet)


public void render() 
{
    float minX= myWorld.landscapeCamera.frustum.planePoints[0].x;
    float minY= myWorld.landscapeCamera.frustum.planePoints[0].y;
    float maxX= myWorld.landscapeCamera.frustum.planePoints[2].x;
    float maxY= myWorld.landscapeCamera.frustum.planePoints[2].y;

    batch.begin();
    Gdx.gl.glClearColor(0f, 0f, 0f, 1);
    Gdx.gl.glClear(GL20.GL_COLOR_BUFFER_BIT);

    batch.setProjectionMatrix(myWorld.landscapeCamera.combined);

    myWorld.level.render(batch, minX, maxX, minY, maxY, 5);

    //draw units at different depths
    for(int d = 5; d >= 0; d--)
    {
        if(d == 0)
        {
            myWorld.lightning.render(batch, minX, maxX, minY, maxY);
        }

        for (int i = 0; i < myWorld.enemies.size(); i++)
        {
            if (myWorld.enemies.get(i).getDrawDepth() == d)
            {
                myWorld.enemies.get(i).render(batch, minX, maxX, minY, maxY);
            }
        }

        //draw allies, moneybags, etc cut for visiblity
    }

    myWorld.level.render(batch, minX, maxX, minY, maxY, 0);

    for(int i = 0; i < myWorld.overheadTexts.size(); i++)
    {
        myWorld.overheadTexts.get(i).render(batch, minX, maxX, minY, maxY);
    }

    batch.end();
    interfaceBatch.setProjectionMatrix(myWorld.interfaceCamera.combined);
		
    interfaceBatch.begin();

    //render interface stuff cut for visiblity (my interface batch has about 10-20 render calls, 
    //my batch loop is the once that has 100s but both are worse than they should be)

-code generated lightning (parts removed for visiblity), I was thinking it might help if I move the lightning render to the end, but then that might screw up its drawdepth, but it might also help reduce draw calls because I think its blend flushes the spritebatch?


    public static void initialize(GameWorld myWorld)
    {
        LightningSegment = myWorld.gameArt.atlas.findRegion("lightningsegment");
        HalfCircle =  myWorld.gameArt.atlas.findRegion("halfcircle");
        Pixel =  myWorld.gameArt.atlas.findRegion("pixel");
        HalfCircle2 =  myWorld.gameArt.atlas.findRegion("halfcircle");
        HalfCircle2.flip(true, false);
    }

    for(int i=0; i<bolts.size(); i++)
    {
        bolts.get(i).draw(batch);
    }

    //inside the bolts draw method
    if (alpha <= 0)
        return;
    for(int i=0; i<Segments.size(); i++) 
    {
        Line segment = Segments.get(i);
        segment.Draw(spriteBatch, new Color(tint).mul(alpha * alphaMultiplier));
    }

    //inside the segment draw function, I noticed it is making a new vector2 every call which is very bad 
    //for rendering so I am going to see if I can figure out a way to do that part in the update 
    //function or some other way
    Vector2 tangent = new Vector2(B).sub(new Vector2(A));
    float theta = (float)Math.toDegrees(Math.atan2(tangent.y, tangent.x));

    float scale = Thickness / Art.HalfCircle.getRegionHeight();
    Color prevColor = spriteBatch.getColor();
    spriteBatch.setColor(tint);
    int blfn = spriteBatch.getBlendDstFunc();
    spriteBatch.setBlendFunction(spriteBatch.getBlendSrcFunc(), Blending.SourceOver.ordinal());

    spriteBatch.draw(Art.LightningSegment, A.x, A.y, 0,Thickness/2, getLength(), Thickness, 1,1, theta);
    spriteBatch.draw(Art.HalfCircle, A.x, A.y, 0,Thickness/2, scale * Art.HalfCircle.getRegionWidth(), Thickness, 1,1, theta);
    spriteBatch.draw(Art.HalfCircle2, B.x, B.y, 0,Thickness/2, scale * Art.HalfCircle2.getRegionWidth(), Thickness, 1,1, theta);

    spriteBatch.setColor(prevColor);
    spriteBatch.setBlendFunction(spriteBatch.getBlendSrcFunc(), blfn);

sorry if thats too much code, I tried to cut away as much as possible while still keeping things that seemed like they might be relevant. An Interesting thing came up though, before my lightning generator was using individual pngs. Noticing this I move them to the texture packer. But instead of helping this made the draw calls way worse, going from 100s on avg to now 200-300s. Does it make any sense that moving a png to the texture packer would be worse than using a individual png? (maybe because its changing the alpha value of the texture at runtime?).

65K · May 19, 2018, 7:29pm

First thing I would do is run the game through a profiler on desktop to see if there is anything obvious generally going on wrong (CPU hotspots, memory allocation, GC)

ff7squirrelman · May 19, 2018, 7:48pm

What would a desktop profiler do that the android one can’t? I’m asking as a legit question because i’m very new to using profilers. I’ll try to find a desktop profiler, I don’t think android studio has one built in so I have to research it a bit.

VaTTeRGeR · May 19, 2018, 7:49pm

[quote]I was actually just asking about whether or not I understood texturepacker and textureatlas right
[/quote]
You understood it right, that’s also how i see it.
You can always look through the LibGDX sources, they aren’t that convoluted and will give you answers.

This part here has the potential to create lots of draw calls:

spriteBatch.setBlendFunction(spriteBatch.getBlendSrcFunc(), Blending.SourceOver.ordinal());//set blendfunc
[...]
spriteBatch.setBlendFunction(spriteBatch.getBlendSrcFunc(), blfn); //reset blendfunc

You flush the batch (=drawcalls) every time the new blendfunc is different than what was previously set. You could make it run faster if you only reset it once after the drawing is finished.
Look at this: https://github.com/libgdx/libgdx/blob/master/gdx/src/com/badlogic/gdx/graphics/g2d/SpriteBatch.java#L997

[quote]Does it make any sense that moving a png to the texture packer would be worse than using a individual png?
[/quote]
Depends, if it splits your atlas in a bad way yes, otherwise no, you want stuff that gets rendered in quick succession to be on the same atlas page.

ff7squirrelman · May 19, 2018, 10:00pm

ok, so I tried making the blend function reset after my batch.end, but that of course caused problems. my fog of war would flicker because of it. So instead I moved it just outside of the lightning loop so instead of reseting after each lightning bolt it reset after all of them.


     int blfn = batch.getBlendDstFunc();
    for(int i=0; i<bolts.size(); i++)
    {
        bolts.get(i).draw(batch);
    }
    batch.setBlendFunction(batch.getBlendSrcFunc(), blfn);

This is still an extra render call for each loop but I imagine it isn’t going to have a huge impact anymore (as opposed to an extra render for each iteration of the loop). I also moved the lightning render to right before my batch.end which should reduce it to 1 extra render call I think(depending on if all the lightning is using the same blending)? It would render on top of the overhead text but doesn’t seem that noticable so far.

So far I haven’t noticed any difference in draw calls (probably just because its a large range to begin with). I’ll try reordering my textureatlas and see what that does since it sure seemed to make things worse when I did it the first time.

oh and I just remembered I never actually got the framebuffer working right so the level is just rendered normal.

Edit:
actually I decided to just use a 4096x4096 atlas for the best performance, having researched more it seems like the majority of devices, mobile included support 4096 now (correct me if i’m wrong but thats what I read). I did some extra testing and noticed particle effects were causing a lot of draw calls so finally figured out how to get them to work with the textureatlas. I was also going to pack my bitmap font into my textureatlas while I was at it since I read it can be done this way:


whitefont1scale100 = new BitmapFont(files.internal("whitefont1scale100.fnt"), atlas.findRegion("whitefont1scale1001"), false);

However my bitmap fonts are all 2 png’s and the constructor only takes 1 png so idk how that is done either. I’m still researching it but not finding much info so far. oh another problem I think I have with my particle emitters is blending (I believe I set them to additive) so I am setting this and then reseting the blend after myself just like with the lightning:


particles.setEmittersCleanUpBlendFunction(false);

I am still trying to make the bitmapfont’s work with the atlas but haven’t been able to find anythin explaining how to do it with a 2 or more png font. But as is using 1 big atlas which hopefully won’t end up being a problem (4096x4096) I was able to get it to an average of 34 batch render calls and something like 15-20 interfacebatch calls. and a worst case of 78 for batch. still not great but much better. I had to increase my spritebatch size (I imagine its because i’m getting less flushes now) to 5000 sprites to be safe. 3000 would probably have worked fine though. Sadly my android fps are still 20 with basically an empty screen (just rendering the level tiles and a few enemies) so it seems like its more likely a code issue (maybe I have too many loops or poorly optimized code) or my device is more outdated that I thought (it seems to work fine with most professional games i’ve tried though). I still need to fix my atlas use a bit though so maybe that will make a difference (I access the atlas directly in some constructors, which afterwords I noticed in the documentation that the atlas.getregion method is very slow so now I will store all not just most of the textureregions like I was doing). I also still need to look into a desktop profiler like 65k suggested in case that shows something the android profiler didn’t. Thanks for all the help (though i’m still open for suggestions if you have more).

65K · May 20, 2018, 6:26am

Because it’s most probably still easier to handle and a quick way to make sure there are no general performance problems.
For simplifying rendering, ordering and avoiding too many calls I am using a little library:

CoDi_R · May 20, 2018, 7:25am

I believe you are on the right track, but I suggest you to approach the issue more … “analytically”. Use the debugger. Run the game, put a breakpoint in SpriteBatch.flush(). You are interested in any calls which are not from SpriteBatch.end(). Have a look at each and figure out why they happen.

they can be caused by changes to:

shaders
textures (pages for an atlas, if atlas size too small to fit all regions in one page)
render states (blending)
uniforms (transforms)
batch buffer too small

edit: One note about your latest remarks: kind of hard to tell w/o knowing how it looks like, but if your font bitmaps aren’t part of an atlas, and if your UI uses a lot of text as well as other elements, you got your render calls right there. That’d be one flush for each UI actor which needs to display both some regions/patches and text.

Also, if atlas size is of concern, one possible solution is to put fonts and UI bitmaps into one atlas, and anything else into another. You flush/change batches anyway from world to UI. It’s only important to not reference one from another, e.g. trying something fancy like rendering world sprites inside the UI.

ff7squirrelman · May 20, 2018, 5:33pm

ah, good idea about the breakpoint, I hadn’t thought of that. The majority of my flushes were bitmapfonts (11 in the loop i went through but it varies widely of course). If I am able to get them in my textureatlas it would help a lot but thats were i’ve been stuck since last night. I used hiero to make my bitmapfont and I have plenty of room in my textureatlas. The problem is hiero made it two 512x512 pngs that the .fnt file refrences fine when its just in my asset folder but then they are in my atlas it can’t find them. there’s a bitmapfont constructor that takes a textureregion but since my font uses two I don’t know how to make it work. pointing to the first one in the atlas didn’t work. The only other idea I have is to manually combine the png’s and change the coordinates in the .fnt file but thats a bit much and i’m sure there must be a way to make it work unless bitmapfont’s are usually 1 page pngs and hiero makes them different.

edit: oh apparantly I missed one of the bitmapfont constructors, this one takes an array of textureregions so it should work, i’ll give it a try now.


public BitmapFont(BitmapFont.BitmapFontData data, Array<TextureRegion> pageRegions, boolean integer)

edit:
I had to do something weird to get the bitmapfont to work because it doesn’t take a filehandle for the first argument. So I had to initialize it with the normal png’s then create a new font using the old font’s data as the first argument and the textureregion array for the second. it’s working but seems kind of sloppy. The good news is the bitmapfont finally being in the atlas a paired with me grouping my particles into a new render function so they all render together and only flush the batch once at the end because of their blending has gotten my average batch render calls to 3-5 and my worst case to 8ish (in the short time I was testing it). The bitmap font is also used in my interface so that went from 15-20ish usually to 4ish as well. So it seems pretty good now as far as draw calls I think. Another wierd thing that happened is the particles for my priest’s healing spell looks different now, I was under the impression the particle emitters set the blend so it should have been fine reseting the blend at the end instead of letting them all reset it automatically but it changed that 1 particle emitter a bit. The others seem to be the same so I just made it render the healing particles twice for now and it looks like before. I probably just have to increase the alpha value in the emitter then I think i’ll be able to remove the second render.

Unfortunetly it only gained about 2-3fps on my android phone so performance-wise it didn’t do as much as I’d of liked. I guess I have to just run it through a profiler again. Check for any new objects made during render that I might have missed. And try to optimize any loops if I can. A lot of the lag seems to stem from the level draw call even though it isn’t causing many draw calls it goes through a fairly large loop rendering every tile in the view area (commenting out the render call brings the fps to 40ish but thats without a lot of other things on the screen). But thats also unavoidable so idk how much I can optimize that, maybe i’ll try figuring out the framebuffer again. last time it would render fine on my screen but when I moved the camera it would be a blur, so I assume I have to regenerate it everytime the camera is moved which kind of counteracts its usefulness. That or possibly a spritecache, I haven’t read as much about them so idk how useful they would be. I could also try implementing poolable but object creation doesn’t seem to be causing much lag.