Profiling is important when optimizing games. When using OpenGL through LWJGL, it’s important to know that the CPU and GPU work asynchronously on very different tasks, so depending on which one of them that is your bottleneck you may get very confusing results. You will not get correct values if you simply measure the time it takes to run an OpenGL command since the command is just added to the OpenGL queue until the queue is full, which is very fast. If the GPU is unable to keep up with the commands submitted, it’ll eventually block the application until the GPU has caught up and there’s room in the queue. That means that OpenGL may block on random OpenGL commands or even on Display.update(), which means that you could get a huge CPU time spike for something as simple as glBindTexture() or a similar command.
The solution is to go directly to the source and ask the GPU how long time stuff took to execute for the GPU. This is tricky to do without affecting performance negatively. One idea is to call an OpenGL command, then use glFinish() to force the CPU to block while the GPU executes all queued up commands, but this causes both a CPU stall and then a GPU stall since the command queue will be completely empty once glFinish() returns leading to massively reduced performance. A better solution is to use OpenGL asynchronous queries to query the time at which the GPU reaches certain points in the command queue. We then delay reading back the result until the GPU has finished the frame we’re profiling.
When profiling on the CPU we can do the following:
long startTime = System.nanoTime();
//run code to profile
long timeTaken = System.nanoTime() - startTime; //Time taken in nano seconds
We can do the same thing with GPU queries, but the asynchronous nature of queries complicates it all a bit:
int startQuery = glGenQueries();
glQueryCounter(startQuery , GL_TIMESTAMP);
//Run OpenGL commands to profile
int endQuery = glGenQueries();
glQueryCounter(endQuery, GL_TIMESTAMP);
//Make sure enough time passes so the above OpenGL commands have been processed by the GPU
if(glGetQueryObjectui(endQuery, GL_QUERY_RESULT_AVAILABLE) == GL_TRUE){ //The result is available
long startTime = glGetQueryObjectui64(startQuery, GL_QUERY_RESULT);
long endTime = glGetQueryObjectui64(endQuery, GL_QUERY_RESULT);
long timeTaken = endTime - startTime;
}
I have developed two small static classes that can take care of GPU profiling. The use of these two classes is as follows (extra brackets added for clarity):
//No special initialization required
public void render(){
GPUProfiler.startFrame();
GPUProfiler.start("Render game");
{
GPUProfiler.start("Render tiles");
renderTiles();
GPUProfiler.end();
GPUProfiler.start("Render objects");
renderObjects();
GPUProfiler.end();
}
GPUProfiler.end();
GPUProfiler.start("Render UI");
renderUI();
GPUProfiler.end();
GPUProfiler.endFrame();
/*
The GPUProfiler keeps track of previous frames until
they've been rendered by the GPU and their results are
ready to be read back. The following line dumps the
timings of all completed frames to System.out.
*/
GPUTaskProfile tp;
while((tp = GPUProfiler.getFrameResults()) != null){
tp.dump(); //Dumps the frame to System.out.
//or use the functions of GPUTaskProfile to extract information about the frame:
//getName(), getStartTime(), getEndTime(), getTimeTaken(), getChildren()
//Then you can draw the result as fancy graphs or something.
GPUProfiler.recycle(tp); //Recycles GPUTaskProfile instances and their OpenGL query objects.
}
}
Here’s some example output of dump() when using my GPU profiler with the engine I’m writing. Note that the CPU time taken to execute all my OpenGL functions is around 1.5ms, with Display.update() blocking for around 25 ms after each frame making it impossible to tell what’s actually keeping the GPU so busy.
Seems like I have too many shadow mapped lights. =P
Here are the two classes:
GPUProfiler: http://www.java-gaming.org/?action=pastebin&id=930
GPUTaskProfile: http://www.java-gaming.org/?action=pastebin&id=931