Game Inefficiencies

Willchill · July 2, 2014, 11:48pm

Hello,

Recently I’ve been developing a game and I’ve noticed it’s using a ridiculous amount of system resources (30% CPU). What are the steps I can take to find these inefficiencies and resolve them?

Thanks.

Rayvolution · July 3, 2014, 12:00am

… by coding better!

There’s no single one magic answer, you’ll have to give us code examples of your most resource intensive code, and we can offer advice on refactoring/optimizing it.

A more proper answer is probably “Use a profiler tool to find bottlenecks in your code” though.

Willchill · July 3, 2014, 1:04am

The most CPU-intensive method called is org.lwjgl.opengl.WindowsContextImplementation.nSwapBuffers. Next is my rendering method, specifically the one I use to render all of the the blocks in the level, however nSwapBuffers() uses 4 times as much processing power as the rendering method.

The method I call to render the tiles is on GitHub: https://github.com/WillchillDev/Game/blob/master/Game/src/me/willchill/game/level/Level.java

A screenshot of the stuff JProfiler is showing:

BurntPizza · July 3, 2014, 1:32am

I see in Block.render() that you are calling glBegin and glEnd for each block rendered. Not only is that inefficient (could call once for all blocks of same texture IIRC), it the deprecated, old openGL pipeline. That’s what probably calling nSwapBuffers so much.

The real “solution” besides batching your block renders is to upgrade to VBOs and such.

Willchill · July 3, 2014, 1:53am

Alright, thanks for the help. Are there any resources (books, videos, websites) you would recommend for learning modern OpenGL?

EgonOlsen · July 3, 2014, 6:35am

As long as your code doesn’t pause while waiting for a vertical sync or a Thread.sleep(), you’ll always end up with at least one CPU core used to ~100%. For that, it doesn’t matter how “efficient” the code is. If it’s more efficient, it might output higher frame rates but that doesn’t change the cpu load.
If you are walking for one hour, you are walking for one hour. It doesn’t matter if you are walking pretty fast or crawling on your knees. The distance after one hour will differ, but the actual load (your body used to 100% for moving around) doesn’t differ.

ra4king · July 3, 2014, 7:33am

A good resource for learning modern OpenGL is the Arcsynthesis tutorials.
The C++ code has been ported to LWJGL: https://www.github.com/ra4king/LWJGL-OpenGL-Tutorials/

Rayvolution · July 3, 2014, 9:26am

ah, I think I see whats probably wrong. You’re using slick, and you’re drawing your images with draw instead of drawEmbedded. Slick can be laggy when you use .draw to draw a large amount of images from the same texture. Any time you draw a collection of images off the same sprite sheet, or the same image multiple times (basically, any time you’re using a single texture) you want to use drawEmbedded.

For example, if you say, are rendering 4 things on your screen at once in a 2x2 grid all from the same texture, you’re probably doing something like this:

for(int x = 0; x < 4; x++) {
	for(int y = startY; y < endY; y++) {
		myImage.draw(xCoord*x,yCoord*y);
	}
}

When you do that, you’re basically calling glBegin and glEnd over and over, like this:

glBegin()
render image 1
glEnd()
glBegin()
render image 2
glEnd()
glBegin()
render image 3
glEnd()
glBegin()
render image 4
glEnd()

… what you want to do, is find any of your loops or anywhere in your program you’re drawing from a single texture (this can even be an entire sprite sheet), and use drawEmbedded, like so:


myImage.startUse();
for(int x = 0; x < 4; x++) {
	for(int y = startY; y < endY; y++) {
		myImage.drawEmbedded(xCoord*x,yCoord*y);
	}
}
myImage.endUse();

What this translates to:
glBegin()
render image 1
render image 2
render image 3
render image 4
glEnd()

It’ll give you a huge performance increase, assuming you have large collections of images on the same sprite sheet (like a sprite sheet for a tilemap, for example).

Also, if you aren’t using sprite sheets in areas that you can, I highly recommend converting to sprite sheets instead of individual files.

theagentd · July 3, 2014, 1:41pm

To clarify on this…

Your GPU is currently the limit here. When you call Display.update() the driver makes sure that the GPU hasn’t fallen too far behind. If it has, then the driver forces the CPU to wait until the GPU has caught up. Most drivers seem to implement this with a busy loop that uses 100% on one CPU core.

EgonOlsen · July 3, 2014, 3:11pm

Mine (GTX 680) even spawns an additional thread so that 2 cores are 100 % busy even if i limit the fps to 60. However, what i actually wanted to express is, that looking at the cpu load while the game is running tells you nothing about the efficiency of the code. Or in other words: Having one core fully loaded isn’t necessarily a sign of bad coding. Cores are there to be used.

theagentd · July 3, 2014, 3:56pm

That’s a feature of the Nvidia driver, not the GPU. Intel also has this feature, and I believe AMD does as well. They basically just append all OpenGL commands to a queue that the other driver thread reads from and runs. It essentially makes most OpenGL commands free for the game’s thread and gives you some extra CPU time to play with. The problem with this is mapping buffers. Every time you call glMapBuffer() or any of its variations (regardless of if you use GL_MAP_UNSYNCHRONIZED or not) the game’s thread has to wait for the driver’s thread to finish, so most of the benefit of the extra thread is lost. This is why the new persistently mapped buffers are so awesome. You can map a buffer once and keep it mapped forever, so you never have to synchronize with the driver’s thread.

ags1 · July 3, 2014, 9:38pm

That is crazy! I’m sure there is a good reason for that, I’d like to hear it…

theagentd · July 3, 2014, 10:45pm

Precision? It doesn’t really matter.

Rayvolution · July 3, 2014, 11:58pm

I think (don’t quote me on it), CPUs and GPUs last longer when they’re forced to always run at 100%. Something about transiter load. I don’t know the details or if I am even right, I just remember reading this somewhere like a decade ago.

Windows does this as well, if you look at your taskbar on older versions of windows they have the “System idle process” that’s always maxed out at whatever percentage of the processor currently is not being used. Windows 7 (and possibly vista) don’t show it anymore though.

theagentd · July 4, 2014, 12:32am

I find it hard to believe that this is true. If it was, then you’d be wasting a shitload of money and/or battery life on that “idle process”. The System idle process is simply there to show you how much of the time the CPU idles (and it’s still there for 7).

ra4king · July 4, 2014, 1:25am

The System Idle Process is there to keep the CPU idle when the scheduler finds no threads ready to execute. That’s why it’s always shown as the percentage not being used, as there must always be a thread running on a CPU at all times. More information on Wikipedia.

theagentd · July 4, 2014, 2:51am

From that link:

Because of the idle process’s function, its CPU time measurement (visible through Windows Task Manager) may make it appear to users that the idle process is monopolizing the CPU. However, the idle process does not use up computer resources (even when stated to be running at a high percent), but is actually a simple measure of how much CPU time is free to be utilized. If no ordinary thread is able to run on a free CPU, only then does the scheduler select that CPU’s System Idle Process thread for execution. The idle process, in other words, is merely acting as a sort of placeholder during “free time”.

In Windows 2000 and later the threads in the System Idle Process are also used to implement CPU power saving. The exact power saving scheme depends on the operating system version and on the hardware and firmware capabilities of the system in question. For instance, on x86 processors under Windows 2000, the idle thread will run a loop of halt instructions, which causes the CPU to turn off many internal components until an interrupt request arrives. Later versions of Windows implement more complex CPU power saving methods. On these systems the idle thread will call routines in the Hardware Abstraction Layer to reduce CPU clock speed or to implement other power-saving mechanisms.

You’re right that it is indeed a real thread (which I didn’t know), but it’s not exactly a normal thread. My main point was that neither the CPU or GPU are unnecessarily burning energy because it’s supposed to be good for them. CPUs and GPUs have massive power saving functions so they don’t have to run at 100% load all the time, which includes shutting down unused parts of the processor or even complete cores and lowering the clock speed to a fraction of what it can run at. My CPU idles at room temperature and my GPUs at 35 degrees. My CPU can drop down to 800 MHz instead of running at 3.9GHz all the time. My GPUs’ cores drop down to 135MHz instead of 1.2GHz and their memory to 162MHz from 1.75GHz. Hardware makers are doing everything they can to decrease power usage and heat generation to be able to get better battery life and smaller devices.

Roquen · July 4, 2014, 11:05am

Too lazy to find a good reference, but here: http://siyobik.info.gf/main/reference/instruction/PAUSE. Just because the CPU is in theory running a tight loop doesn’t mean all of the units are running.