LibGDX SpriteBatch performance degradation

It seems that my 2D platformer game is becoming slower over time. I decided to check out LibGDX since it was up to date with newer versions of opengl, runs smoothly, and acts as more of an optional abstraction around the pipeline. I haven’t run into any noticeable performance issues, but I have seen my cpu reach 19% over time. To be specific, when I first launch the game, it runs quickly and efficiently. Only about 2% of the cpu is being used at most. After about 30 seconds, it’s 4%. Then, after a minute, it peaks to about 19%. It never goes beyoind that number, and the fps never drops. I can hear my fan going nuts, though, so it is still a problem.

I made a simple but effective performance logger that records the amount of time it takes to complete a task. After about, say, 60 recordings, since my game’s fps is 60, I log a message to a PrintStream regarding the average amount of time it took to complete the task. I narrowed down where the performance degrades, and it’s in LibGDX’s SpriteBatch class. Invocations of end() start to become slow after a while, but I am not sure why. Here the the method utilizing SpriteBatch that sees performance issues.


	public void render(Matrix4 m, float parentAlpha)
	{
		// Optimizes the grid so that only the tiles that are visible are displayed
		if(optimized)
			optimize(Gdx.graphics.getWidth(), Gdx.graphics.getHeight());
		
		// Grabs SpriteBatch from globally accessible Renderer
		SpriteBatch batch = Renderer.getSpriteBatch();
		
		// Sets alpha to the one accumulated through the scene graph
		batch.setColor(1, 1, 1, parentAlpha);
		
		// Begins drawing
		batch.begin();
		
		// Renders all tiles within optimized range.  Mins and maxes were determine when optimize() was invoked
		for(int y=minY; y<maxY; y++)
		{
			for(int x=minX; x<maxX; x++)
			{
				TextureRegion reg = get(x, y);
				if(reg != null)
				{
					batch.draw(reg, x*regWidth, y*regHeight);
				}
			}
		}
		
		// Ends drawing
                logger.begin();        // Begins recording the amount of time it takes to complete batch.end()
		batch.end();
                logger.end();          // Ends recording, and outputs the average amount of time it took to complete after 60 invocations.
	}

Ok, not all variables are defined in this example, but I think what is happening here should be clear. I have a matrix of tiles that I want to render, and the tiles are an array of TextureRegions. When rendering, I iterate through the matrix from the top left to the bottom right horizontally before vertically. When I am done rendering this batch, I invoke batch.end(). I have proven that the degredation occurs specifically at the invocation of batch.end(), because the amount of time it takes to do this after about a minute is 17-18 milliseconds, which composes the majority of the aforementioned slowdown.

Why would this method behave like an angel, and then suddenly slow down later? I can’t really focus my logger on a particular part of this method since it’s part of LibGDX. Is there some way that I could be misusing SpriteBatch here, or somewhere else? I’m still fairly new to LibGDX, so I would not be surprised.

Edit: I am aware that SpriteBatch’s end() method uploads vertices and texture ids to opengl that were gathered earlier. I feel like somehow, my batch is accumulating vertices/id’s/SOMETHING, and perhaps I am forgetting to clear that something.

You’re looking in the wrong place… run -Xprof on the output and show it here. I think I know what your issue is and it’s actually benign.

Cas :slight_smile:

There’s not enough information about the argument -xprof online. Can’t seem to figure out how to use it. Doesn’t seem to work when compiling my application to a jar, and I don’t really know how to compile without archiving it. Sorry If I’m a little shaky on command-line java arguments.

Edit: Did a bit of testing. Does this suffice? Seems to work for me.


java -Xprof -jar game.jar

Not sure what the profiling information means, but here it is!



C:\Users\Owner\Desktop\test>java -Xprof -jar game.jar

Flat profile of 0.37 secs (33 total ticks): main

  Interpreted + native   Method
 36.4%     0  +    12    org.lwjgl.openal.ALC10.nalcCreateContext
 30.3%     0  +    10    java.net.NetworkInterface.getAll
  3.0%     0  +     1    java.lang.ClassLoader.findBootstrapClass
  3.0%     0  +     1    org.lwjgl.openal.ALC10.nalcOpenDevice
  3.0%     0  +     1    org.lwjgl.openal.AL.nCreate
  3.0%     0  +     1    java.util.zip.Inflater.inflateBytes
  3.0%     1  +     0    java.lang.ClassLoader.defineClass1
  3.0%     0  +     1    java.io.WinNTFileSystem.canonicalize0
  3.0%     0  +     1    java.lang.ClassLoader$NativeLibrary.find
  3.0%     1  +     0    java.security.Provider.addEngine
  3.0%     1  +     0    java.lang.String.replace
 93.9%     3  +    28    Total interpreted

  Thread-local ticks:
  6.1%     2             Class loader


Flat profile of 97.06 secs (8762 total ticks): LWJGL Timer

  Thread-local ticks:
100.0%  8762             Blocked (of total)


Flat profile of 97.39 secs (8790 total ticks): DestroyJavaVM

  Thread-local ticks:
100.0%  8790             Blocked (of total)


Flat profile of 97.40 secs (8791 total ticks): LWJGL Application

  Interpreted + native   Method
  3.9%     0  +   150    java.lang.Thread.sleep
  1.6%     0  +    60    sun.misc.Unsafe.copyMemory
  1.3%     0  +    49    org.lwjgl.opengl.WindowsDisplay.defWindowProc
  1.2%     0  +    47    org.lwjgl.opengl.WindowsDisplay.nUpdate
  0.9%     0  +    36    sun.misc.Unsafe.getInt
  0.9%     0  +    34    sun.misc.Unsafe.putInt
  0.7%     1  +    25    org.lwjgl.opengl.GL11.nglDrawElements
  0.2%     0  +     8    java.lang.System.nanoTime
  0.2%     6  +     0    sourceProperties.Gravity.update
  0.1%     0  +     4    org.lwjgl.opengl.WindowsContextImplementation.nSwapBuff
ers
  0.1%     0  +     4    org.lwjgl.opengl.WindowsPeerInfo.nChoosePixelFormat
  0.1%     0  +     4    org.lwjgl.opengl.GL11.nglClearColor
  0.1%     3  +     0    java.lang.ClassLoader.defineClass1
  0.1%     0  +     2    org.lwjgl.opengl.WindowsContextImplementation.nCreate
  0.1%     0  +     2    com.badlogic.gdx.graphics.g2d.Gdx2DPixmap.load
  0.1%     0  +     2    org.lwjgl.opengl.WindowsContextImplementation.nMakeCurr
ent
  0.0%     0  +     1    sun.misc.Unsafe.compareAndSwapObject
  0.0%     0  +     1    java.util.zip.Inflater.inflateBytes
  0.0%     0  +     1    org.lwjgl.opengl.GL20.nglUniformMatrix4fv
  0.0%     0  +     1    sun.misc.Unsafe.getLong
  0.0%     0  +     1    org.lwjgl.WindowsSysImplementation.nGetTime
  0.0%     0  +     1    sun.misc.Unsafe.getByte
  0.0%     0  +     1    org.lwjgl.opengl.WindowsDisplay.showWindow
  0.0%     0  +     1    org.lwjgl.opengl.GL11.nglGenTextures
  0.0%     0  +     1    org.lwjgl.openal.ALC10.nalcCloseDevice
 12.0%    20  +   440    Total interpreted (including elided)

         Stub + native   Method
 62.7%     0  +  2403    org.lwjgl.opengl.GL11.nglDrawElements
 11.2%     0  +   431    org.lwjgl.WindowsSysImplementation.nGetTime
  6.1%     0  +   232    java.lang.Thread.sleep
  4.5%     0  +   171    java.lang.Thread.yield
  2.9%     0  +   110    java.lang.System.arraycopy
  0.4%     0  +    15    sun.misc.Unsafe.putByte
  0.1%     0  +     3    sun.misc.Unsafe.compareAndSwapObject
  0.1%     0  +     3    com.badlogic.gdx.utils.BufferUtils.copyJni
  0.0%     0  +     1    java.lang.Thread.currentThread
  0.0%     0  +     1    org.lwjgl.opengl.GL11.nglDepthMask
  0.0%     0  +     1    com.badlogic.gdx.math.Matrix4.mul
  0.0%     0  +     1    org.lwjgl.opengl.GL20.nglVertexAttribPointer
 88.0%     0  +  3372    Total stub

  Thread-local ticks:
 56.4%  4959             Blocked (of total)


Global summary of 97.77 seconds:
100.0%  8824             Received ticks
  0.3%    23             Compilation
  0.0%     2             Class loader

C:\Users\Owner\Desktop\test>pause
Press any key to continue . . .

Seems that whatever method is nglDrawElements, it’s using up a considerable amount of time. By the way, I just tested it, and the cpu goes down after aout 2 minutes inexplicably. Weird… Still a problem.

You said that the problem was benign. Is this supposed to happen? I’d rather the cpu usage be low than high, even if the perceived performance is good regardless. I’m not here to melt cpus. If the problem really isn’t important, I’d like to know what’s happening, at least. I have only a little experience in OpenGL, but have since switched to LibGDX becacuse I got sick of it. I am vaguely familiar with glDrawElements, since I believe I either used that or the array variation at some point when learning the pipeline. Is there any reason why this method would be slowing down over time?

Generally only because it’s being called too often. Sometimes it goes particularly slow with some drivers if it’s given duff parameters and/or data.

Are you using VBOs?

Cas :slight_smile:

19% CPU usage is not problematic at all (unless you have like a top of the line Xeon with this usage percentage ::)).

To understand what’s going on behind the scenes first you have to understand how a SpriteBatch works: When you’re working with modern OpenGL you should be storing your data (vertices, texture coordinates, normals, colors, etc.) in buffers. Sending data to the GPU in buffers and then later rendering them is one of the fastest ways render things today. However, having too much buffers is not healthy for the performance either. That’s because draw calls are probably the most expensive calls that you can make on the GPU, and when you have N buffers that you want to render you’re going to have to call N draw calls (there are many ways of optimizing this in OGL4.0+ but I’m not going to go into that now) which is obviously slow. Thus, a game developer should always decide how many buffers does he want to use: One VBO per object, grouping together multiple objects based on some property (eg. put all the static meshes into a single buffer, store the dynamic and streamed objects separately) or storing everything in a single buffer.

Each and every one of these has it’s ups and downs and you should pick the right approach for your game. For 3D games storing everything in a single buffer is of course not viable, since every time you do a draw call the GPU would have to render EVERYTHING that is in your game. There would be no frustum or occlusion culling, unless you reconstruct the buffer every frame which is obviously really going to hurt CPU performance. However, for 2D games storing everything in a single buffer is perfect: We don’t have too many vertices and frustum culling techniques are extremely cheap when working with 2D so we can easily reconstruct the buffer every frame on the CPU without hurting performance too much.

Problem is, every time we need to render something with a different texture than the last time we have to create a new buffer, bind the new texture, store new data in the buffer and send that to the GPU as well as the previous buffer. Of course this is a naive approach since there are many ways to reduce this performance problem, here is two: Texture arrays and texture atlases.

With texture arrays you can have multiple textures’ data packed into a single texture object, but this technique has it’s own set of problems so I would not recommend using it to you until you learn more about OpenGL.
Texture atlases on the other hand, have very few negatives and they are commonly used in almost all the (if not in every) commercial 2D games. With texture atlases you have multiple textures packed into a single image file, so whenever you want to render something if that thing is on the atlas as well as the previous one there’s no need to bind new textures or create new buffers. You can just keep storing the data in the same buffer that you later want to send to the GPU. Notice however, that since you have multiple textures in the same image you can’t use simple texture coordinates anymore like (0, 0); (1, 0); (1, 1); (0, 1); Luckily that is not too big of a problem and you can easily calculate the position of your textures in the atlas, or use some kind of texture packer utility and parse it’s output.

Hope this helps. :slight_smile:

Ah, well, I do plenty of texture batching, and I created my own sort of TextureAtlas class that loads Textures in batches using the output of a png and txt file. I’m sure that’s not what is causing the issue here. I suppose this may not be a huge issue after all, but I still don’t understand why the cpu uage would slowly go up over time. Are you saying that I am using too many buffers in my instance of SpriteBatch, and that’s what is causing eventual performance loss?

Hrm! I can prevent performance issues by reducing the number of buffers in SpriteBatch! And the reason for this is… I don’t know! Like great programmers before me have said, optimizing prematurely is the root of all evil. I don’t want want to continue working on my game without understanding exactly why this code is working.

Let me see if I’ve got this straight… Every draw call adds a list of texture data and vertices for later, and a nice default shader program is written for me so I don’t have to make one. Whenever .end() is invoked, that buffered data will be uploaded to OpenGL for rendering. I am also assuming that the same will happen if the buffers become full. Both cases will cause the buffers to flush. In reality, it’s adding stuff to a hidden Mesh instance, and then rendering it, which really does the same thing, but in a more modular way. By reducing the sizes of the buffers, I am increasing the number of draw calls, and making my packets of data smaller. That sounds bad!

Or is it?

I noticed that if I force the buffer size to be, say, 200, the cpu peaks at about 10% rather than 19%! I know I probably sound like a guy that doesn’t know what he’s talking about, and I probably am. If I need to look further into this, feel free to tell me.

I don’t think I am, since I never specified it. Does SpriteBatch give you that option? I am aware that it internally uses a Mesh instance, and a Mesh can be toggled to use different rendering methods, including VBOs.

Edit: Oh, and thanks for showing me the -Xprof argument. That’s probably going to be helpful in the future for me.