LibGDX Depth Testing is horrendously slow

Ecumene · September 28, 2016, 3:15am

In the render method we do two things

Render all entities to an FBO
Scale FBO and draw it to the screen

In the FBO, depth testing is enabled and the resolution is very low
[icode]entityBuffer = new FrameBuffer(Format.RGBA8888, (int) camera.viewportWidth, (int) camera.viewportHeight, true);[/icode]

When drawing to the FBO, there seems to be a pretty big performance difference on what we depth test.

The game runs slower the more we depth-test
The depth buffer is cleared with 0.0 every frame
The framebuffer IS bound, and when we render it depth-testing is disabled
We’re using [icode]gl_FragDepth[/icode] in a shader that renders every entity to the framebuffer to set the depth, disabling it does nothing performance wise

It seems like when I enable the depth buffer while drawing a large group of objects it becomes very slow on my laptop, but my PC with it’s dedicated video card does fine.

Is there any way to speed up depth testing? It seems like that’s what is slowing the game down

theagentd · September 28, 2016, 5:20am

= “the more overdraw we add”? That would of course be expected, as more pixels = more work, especially for a shitty integrated laptop GPU.

That being said, depth testing with 3D geometry usually improves performance as the depth test can run before the fragment shader, allowing the GPU to avoid running the fragment shader for occluded pixels. This depends on two things to work:

If the shader writes to gl_FragDepth, the shader implicitly has to run before the depth test as it determines the value used in the comparison. This has VERY significant performance implications.
If discard; is used, the early depth test’s functionality is severely limited as it cannot update the value in the depth buffer until the shader has executed or it would write depth for discarded pixels. This again can have performance implications, but usually not as severe as an early depth test can still be run against previously drawn geometry, potentially avoiding shader execution, but this has to be much more conservative.

You should never write to gl_FragDepth if you can avoid it since it disables so many important optimizations. If your geometry is flat, then simply outputting the depth to the vertices will give you the same result but allow all the optimizations to work as expected. If you however for some reason need non-linear per-pixel depth, there are still things you can do to improve performance. If you are able to calculate the minimum depth (the depth value closest to the camera), you’re able to output that as a conservative depth value in the vertex shader. You can then in the fragment shader specify how exactly you will modify the depth value of gl_FragDepth to allow the GPU to run a conservative depth test against the hardware computed depth (the one you outputted from the vertex shader). You always want to modify the depth in the OPPOSITE way that you’re testing it. Example:

You use GL_LESS for depth testing and the depth is cleared to 1.0.
You output the MINIMUM depth that the polygon can possibly have from the vertex shader.
In the fragment shader, you specify that the depth value will always be GREATER than the hardware computed value using [icode]layout (depth_greater) out float gl_FragDepth;[/icode].

This will allow your GPU to run a conservative depth test using the hardware computed depth value, at least giving the GPU a chance (similar to when discard; is used) of culling things before running the fragment shader. This feature requires hardware compatibility though, but GL_ARB_conservative_depth is available on all OGL3 GPUs as an extension, even Intel, plus OGL2 Nvidia GPUs. Additionally, this can be queried in the GLSL shader and be enabled if available, and won’t cause any damage if it isn’t available at least (at least if you skip computing the minimum depth in vertex shader as well).

Clearing the screen to 0.0 would cause nothing to ever pass the depth test if you use standard GL_LESS depth testing. I’d strongly suggest using GL_LESS and clearing to 1.0 instead as that is the standard way of using a depth buffer, which in some cases could be faster in hardware.

If you could specify some more information about your use case, I could give you better advice and more information.

purenickery · September 28, 2016, 5:32pm

Thanks for the in-depth reply! Here’s some more information:

Based on the testing and timings that we’ve done, I figure it has to be something other than writing to the depth buffer itself that is causing the problem. Just to test it out (and I might end up keeping it this way), I had a free channel in my shadow frame buffer that I wasn’t using, so I manually wrote the depth of all the objects to the free channel in the shadow buffer. Then, in the main rendering shader, I checked the depth against the channel in the shadow buffer and drew/didn’t draw the pixel accordingly. On my laptop with integrated GPU, it ran at around 250 FPS this way, while with the standard depth buffer it ran at 30-40 FPS. On my PC with dedicated graphics it has no real impact compared to standard depth testing.

There is no way the code I scraped together is 5 times more efficient than the built in depth testing so I figure there has to be something else we were doing wrong ???

theagentd · September 29, 2016, 2:03am

hah

I’m drawing the same conclusion as you. You’re most likely hitting a slow path on Intel for some reason. Some possibilities to explore:

gl_FragDepth disables hierarchical depth buffers, compression, etc making it slow.
gl_FragDepth may just be inherently slow in hardware on Intel cards.
Clearing the depth buffer to 0.0 and using GL_GREATER depth testing may be slow and/or disable hardware optimizations.

theagentd · October 2, 2016, 4:57am

I’m a bit interested in a follow-up on this, if you have time sometime. =P

purenickery · October 5, 2016, 5:02am

I tried clearing the depth buffer to 1 and using GL_LEQUAL but it remained just as slow. I believe the problem was due to calling gl_FragDepth on every single fragment. I sped it up by going full 3d mode and having openGL calculate depth for me so I don’t have to call gl_FragDepth, consequently I basically have a 3d game now :

Hydroque · October 5, 2016, 6:50am

I found one potential issue. Make sure

[quote]Because the depth format is a normalized integer format, the driver will have to use the CPU to convert the normalized integer data into floating-point values. This is slow.

The preferred way to handle this is with this code:

  if(depth_buffer_precision == 16)
  {
    GLushort mypixels[width*height];
    glReadPixels(0, 0, width, height, GL_DEPTH_COMPONENT, GL_UNSIGNED_SHORT, mypixels);
  }
  else if(depth_buffer_precision == 24)
  {
    GLuint mypixels[width*height];    //There is no 24 bit variable, so we'll have to settle for 32 bit
    glReadPixels(0, 0, width, height, GL_DEPTH_COMPONENT, GL_UNSIGNED_INT_24_8, mypixels);  //No upconversion.
  }
  else if(depth_buffer_precision == 32)
  {
    GLuint mypixels[width*height];
    glReadPixels(0, 0, width, height, GL_DEPTH_COMPONENT, GL_UNSIGNED_INT, mypixels);
  }

[/quote]
Dunno tho tryna help. Also, make sure you allocate all buffers and check if there are multiple depth testing actions going on, which would be more than necessary ofc.

theagentd · October 5, 2016, 7:49am

@Hydroque: Completely irrelevant to the thread. The problem here is depth TESTING.