Game Inefficiencies

Hello,

Recently I’ve been developing a game and I’ve noticed it’s using a ridiculous amount of system resources (30% CPU). What are the steps I can take to find these inefficiencies and resolve them?

Thanks.

… by coding better! :smiley:

There’s no single one magic answer, you’ll have to give us code examples of your most resource intensive code, and we can offer advice on refactoring/optimizing it. :wink:

A more proper answer is probably “Use a profiler tool to find bottlenecks in your code” though.

The most CPU-intensive method called is org.lwjgl.opengl.WindowsContextImplementation.nSwapBuffers. Next is my rendering method, specifically the one I use to render all of the the blocks in the level, however nSwapBuffers() uses 4 times as much processing power as the rendering method.

The method I call to render the tiles is on GitHub: https://github.com/WillchillDev/Game/blob/master/Game/src/me/willchill/game/level/Level.java

A screenshot of the stuff JProfiler is showing:

I see in Block.render() that you are calling glBegin and glEnd for each block rendered. Not only is that inefficient (could call once for all blocks of same texture IIRC), it the deprecated, old openGL pipeline. That’s what probably calling nSwapBuffers so much.

The real “solution” besides batching your block renders is to upgrade to VBOs and such.

Alright, thanks for the help. Are there any resources (books, videos, websites) you would recommend for learning modern OpenGL?

As long as your code doesn’t pause while waiting for a vertical sync or a Thread.sleep(), you’ll always end up with at least one CPU core used to ~100%. For that, it doesn’t matter how “efficient” the code is. If it’s more efficient, it might output higher frame rates but that doesn’t change the cpu load.
If you are walking for one hour, you are walking for one hour. It doesn’t matter if you are walking pretty fast or crawling on your knees. The distance after one hour will differ, but the actual load (your body used to 100% for moving around) doesn’t differ.

A good resource for learning modern OpenGL is the Arcsynthesis tutorials.
The C++ code has been ported to LWJGL: https://www.github.com/ra4king/LWJGL-OpenGL-Tutorials/

ah, I think I see whats probably wrong. You’re using slick, and you’re drawing your images with draw instead of drawEmbedded. Slick can be laggy when you use .draw to draw a large amount of images from the same texture. Any time you draw a collection of images off the same sprite sheet, or the same image multiple times (basically, any time you’re using a single texture) you want to use drawEmbedded.

For example, if you say, are rendering 4 things on your screen at once in a 2x2 grid all from the same texture, you’re probably doing something like this:

for(int x = 0; x < 4; x++) {
	for(int y = startY; y < endY; y++) {
		myImage.draw(xCoord*x,yCoord*y);
	}
}

When you do that, you’re basically calling glBegin and glEnd over and over, like this:

glBegin()
render image 1
glEnd()
glBegin()
render image 2
glEnd()
glBegin()
render image 3
glEnd()
glBegin()
render image 4
glEnd()

… what you want to do, is find any of your loops or anywhere in your program you’re drawing from a single texture (this can even be an entire sprite sheet), and use drawEmbedded, like so:


myImage.startUse();
for(int x = 0; x < 4; x++) {
	for(int y = startY; y < endY; y++) {
		myImage.drawEmbedded(xCoord*x,yCoord*y);
	}
}
myImage.endUse();

What this translates to:
glBegin()
render image 1
render image 2
render image 3
render image 4
glEnd()

It’ll give you a huge performance increase, assuming you have large collections of images on the same sprite sheet (like a sprite sheet for a tilemap, for example).

Also, if you aren’t using sprite sheets in areas that you can, I highly recommend converting to sprite sheets instead of individual files.

To clarify on this…

Your GPU is currently the limit here. When you call Display.update() the driver makes sure that the GPU hasn’t fallen too far behind. If it has, then the driver forces the CPU to wait until the GPU has caught up. Most drivers seem to implement this with a busy loop that uses 100% on one CPU core.

Mine (GTX 680) even spawns an additional thread so that 2 cores are 100 % busy even if i limit the fps to 60. However, what i actually wanted to express is, that looking at the cpu load while the game is running tells you nothing about the efficiency of the code. Or in other words: Having one core fully loaded isn’t necessarily a sign of bad coding. Cores are there to be used.

That’s a feature of the Nvidia driver, not the GPU. Intel also has this feature, and I believe AMD does as well. They basically just append all OpenGL commands to a queue that the other driver thread reads from and runs. It essentially makes most OpenGL commands free for the game’s thread and gives you some extra CPU time to play with. The problem with this is mapping buffers. Every time you call glMapBuffer() or any of its variations (regardless of if you use GL_MAP_UNSYNCHRONIZED or not) the game’s thread has to wait for the driver’s thread to finish, so most of the benefit of the extra thread is lost. This is why the new persistently mapped buffers are so awesome. You can map a buffer once and keep it mapped forever, so you never have to synchronize with the driver’s thread.

That is crazy! I’m sure there is a good reason for that, I’d like to hear it…

Precision? It doesn’t really matter.

I think (don’t quote me on it), CPUs and GPUs last longer when they’re forced to always run at 100%. Something about transiter load. I don’t know the details or if I am even right, I just remember reading this somewhere like a decade ago.

Windows does this as well, if you look at your taskbar on older versions of windows they have the “System idle process” that’s always maxed out at whatever percentage of the processor currently is not being used. Windows 7 (and possibly vista) don’t show it anymore though.

I find it hard to believe that this is true. If it was, then you’d be wasting a shitload of money and/or battery life on that “idle process”. The System idle process is simply there to show you how much of the time the CPU idles (and it’s still there for 7).

The System Idle Process is there to keep the CPU idle when the scheduler finds no threads ready to execute. That’s why it’s always shown as the percentage not being used, as there must always be a thread running on a CPU at all times. More information on Wikipedia.

From that link:

You’re right that it is indeed a real thread (which I didn’t know), but it’s not exactly a normal thread. My main point was that neither the CPU or GPU are unnecessarily burning energy because it’s supposed to be good for them. CPUs and GPUs have massive power saving functions so they don’t have to run at 100% load all the time, which includes shutting down unused parts of the processor or even complete cores and lowering the clock speed to a fraction of what it can run at. My CPU idles at room temperature and my GPUs at 35 degrees. My CPU can drop down to 800 MHz instead of running at 3.9GHz all the time. My GPUs’ cores drop down to 135MHz instead of 1.2GHz and their memory to 162MHz from 1.75GHz. Hardware makers are doing everything they can to decrease power usage and heat generation to be able to get better battery life and smaller devices.

Too lazy to find a good reference, but here: http://siyobik.info.gf/main/reference/instruction/PAUSE. Just because the CPU is in theory running a tight loop doesn’t mean all of the units are running.