SSAO in LibGDX sans Deferred Rendering?

Hi All!

I’m working on a game that requires SSAO in LibGDX. Is there any way to implement SSAO with just the depth buffer and still have favourable results? I understand straight edge-detecting produces artifacts, but how bad are they really? Can a blur filter or other things help with this?

I could go the route of developing a deferred rendering pipeline in LibGDX using opengl 3. But I’d much prefer to stick to the forward-rendering + post-processing I have now for the library’s sake. I did look over how to go about deferred rendering in LibGDX and I believe it would take some time to get everything going as some of the pipeline would be using 2.0’s glsl and others 3.0…

Thanks!

Traditional SSAO doesn’t require anything but a depth buffer. However, normals help quite a bit in improving quality/performance. You should be able to output normals from your forward pass into a second render target. It is also possible to reconstruct normals by analyzing the depth buffer, but this can be inaccurate if you got lots of depth discontinuities (like foliage).

EDIT: Technically, SSAO is occlusion, meaning it should only be applied to the ambient term of the lighting equation. The only way to get “correct” SSAO is therefore to do a depth prepass (preferably output normal too), compute SSAO, then render the scene again with GL_EQUAL depth testing while reading SSAO from the current pixel. If you already do a depth prepass, this should essentially be free. If not, maybe you should! It could improve your performance.

you can approach ssao with depth-only in a simple way where you apply simple unsharp-masking.

you can then cutoff values <50% or >50% to achieve shadows/darkening or glowing/halos.
adding a depth-range check to falloff the effect can deal with high discontinuities in the depth-buffer to avoid “leaking” or false-shadows.

this can look very good on static images. as soon as the camera turns - you get perspective incorrect darkening. depends on the scene. this is the point when adding a normal-buffer helps alot.

also, bilateral upsampling, say your ssao pass is 50% resolution … image quality will profit from normal-tests alot, tho’ testing depth only works ok too. again depends on the scene you draw.

This is interesting! As I understand it, depth prepass is rendering the scene to a depth buffer similar to a shadow map before rendering the scene, yes? I could put the normal values in the color pixels and depth in the z coordinate.

Is there any benefit to the z-prepass other than performance and ‘correct’ SSAO? There must be some cool things I can do with it also :slight_smile:

Okay, I’ll see what I can do with generating the scene normals+z in a prepass. Thanks for the info guys! I’ll post back later.

The traditional purpose of doing a depth pre-pass is to avoid shading pixels twice. By rendering the depth first, the actual shading can be done with GL_EQUAL depth testing, meaning each pixel is only shaded once. The depth pre-pass also rasterize at twice the speed as GPUs have optimized depth-only rendering for shadow maps, so by adding a cheap pre-pass you can eliminate overdraw in the shading.

To also output normals, you need to have a color buffer during the depth pre-pass, meaning you’ll lose the double speed rasterization, but that shouldn’t be a huge deal. You can store normal XYZ in the color, while depth can be read from the depth buffer itself and doesn’t need to be explicitly stored.

If you have a lot of vertices, rendering the scene twice can be very expensive. In that case, it’s possible for you to do semi-deferred rendering where you do lighting as you currently do but also output the data you need to do SSAO afterwards. This require using an FBO with multiple render targets, but it’s not that complicated. The optimal strategy depends on the scene you’re trying to render.

A few questions here, in the final shader how do you read depth from a uniform texture? Currently I have 2 textures going into the last pass. One with scene normals, the other with the original scene colors (textures and all) Here’s how the scene looks.

uniform PRECISION sampler2D u_texture0;	// scene
uniform PRECISION sampler2D u_texture1;	// normalmap
varying vec2 v_texCoords;
void main() {
	gl_FragColor = mix(texture2D(u_texture0, v_texCoords), texture2D(u_texture1, v_texCoords), 0.5);
}

The code for rendering objects:

gl_FragColor.xyz = ((normal + 1.0)/2.0).xyz;
gl_FragColor.w = ((gl_FragCoord.z));

Should I encode the frag z value in alpha? How do I read depth pixels from a colorbuffer texture?

As you can see a lot of the geometry is very simple, most scenes should keep it below 2,000 or so polygons. If that helps!

You do not need to store depth in a color texture. You can simply bind the depth texture you use as depth buffer and bind that as any other texture. The depth value between 0.0 and 1.0 is returned in the first color channel (red channel) when you sample the texture with texture() or texelFetch().

It looks like LibGDX uses Depth render objects. How does this change how I bind the texture how you say?

Gdx.gl.glActiveTexture(GL20.GL_TEXTURE2);
Gdx.gl.glBindTexture(GL20.GL_TEXTURE_2D, prepass.getDepthBufferHandle());

Prepass is the framebuffer in question. Binding the getDepthBufferHandle() just binds a random texture loaded in the game.

Is LibGDX using a renderbuffer?

Take a look


I’ve had trouble in other games with reading the depth in GLSL and I’ve always gotten away with using the Alpha channel. Do you have any info on how to maybe read depth from this?

Here’s the relevant code for binding the depth buffer

if (hasDepth) {
	gl.glFramebufferRenderbuffer(GL20.GL_FRAMEBUFFER, GL20.GL_DEPTH_ATTACHMENT, GL20.GL_RENDERBUFFER, depthbufferHandle);
}

Here’s how Framebuffer.java initialises the color texture per-framebuffer

@Override
protected Texture createColorTexture () {
	int glFormat = Pixmap.Format.toGlFormat(format);
	int glType = Pixmap.Format.toGlType(format);
	GLOnlyTextureData data = new GLOnlyTextureData(width, height, 0, glFormat, glFormat, glType);
	Texture result = new Texture(data);
	result.setFilter(TextureFilter.Linear, TextureFilter.Linear);
	result.setWrap(TextureWrap.ClampToEdge, TextureWrap.ClampToEdge);
	return result;
}

Here’s how I initialise a framebuffer + use it if that helps in reading the depth

// Arguments are: Format, Width, Height, useDepth (If it will attach a depth component or not, this is used in GLFramebuffer.java)
prepass = new FrameBuffer(Pixmap.Format.RGB888, width, height, false);
// This binds and sets the framebuffer's viewport 
prepass.begin();
// This resets the viewport and sets everything back to "defaultFramebufferHandle" (GLFramebuffer.java)
prepass.end();

EDIT: According to this article you can’t read from renderbuffers… Looks like I’m gonna have to make my own framebuffer that disables the depth renderbuffer and attaches a texture for the depth component… Oh boy

Renderbuffers are a bit of a legacy feature. They are meant for exposing formats the GPU can render to but can’t be read in a shader (read: multisampled stuff). The thing is that multisampled textures are supported by all OGL3 GPUs so they no longer fill any real purpose anymore. If you do the FBO setup yourself, you can attach a GL_DEPTH_COMPONENT24 texture as depth attachment and read it in a shader.

Fuck yeah! All done!

One issue:

Any idea what this is and how to get rid of it? Thnx!

To fix the SSAO going too far up along the cube’s edges, you need to reduce the depth threshold.

I can also see some banding in your SSAO. If you randomly rotate the sample locations per pixel, you can trade that banding for noise instead, which is much less jarring to the human eye.

I reduced the threshold, and i tried to add the random texture. It looks like it’s still getting some banding issues. Here’s my code and some screenshots, take a look.

Here’s the random texture:

Pixmap pixmap = new Pixmap(4, 4, Pixmap.Format.RGB888);
for(int x = 0; x < 4; x++)
    for(int y = 0; y < 4; y++) 
        pixmap.drawPixel(x, y, Color.rgb888(random.nextFloat(), random.nextFloat(), random.nextFloat()));
noiseTexture = new Texture(pixmap);
noiseTexture.setFilter(Texture.TextureFilter.Nearest, Texture.TextureFilter.Nearest);
noiseTexture.setWrap(Texture.TextureWrap.Repeat, Texture.TextureWrap.Repeat);

Kernel Calculation (The problem is probably here)

for(int i = 0; i < kernels32.length/3; i++){
    temp.set(random.nextFloat(), random.nextFloat(), (random.nextFloat() + 1.0f) / 2.0f);
    temp.nor();
    float scale = (float)i/32f;
    temp.scl(Math.max(0f, Math.min(1, scale*scale)));
    kernels32[i * 3 + 0] = temp.x;
    kernels32[i * 3 + 1] = temp.y;
    kernels32[i * 3 + 2] = temp.z;
}

I’m not sure what your “kernel” is. Are those the sample locations for your SSAO? I’d recommend precomputing some good sample positions instead of randomly generating them, as you’re gonna get clusters and inefficiencies from a purely random distribution. JOML has some sample generation classes in the org.joml.sampling package that may or may not be of use to you.

It doesn’t look like you’re using your noise texture correctly. A simple way of randomly rotating the samples is to place random normalized 3D vectors in your noise texture, then reflect() each sample against that vector. I’m not sure how you’re using your random texture right now, but it doesn’t look right at all. If you let me take a look at your GLSL code for that, I can help you fix it.

I’ve got to go to school, but take a look!

#define PRECISION
precision mediump float;

//uniform PRECISION sampler2D u_texture0;// scene
uniform PRECISION sampler2D u_texture1;	//  normalmap
uniform PRECISION sampler2D u_texture2;	//  depthmap
uniform PRECISION sampler2D u_texture3;	//  randommap

#define KERNEL_SIZE 32
#define CAP_MIN_DISTANCE 0.0001
#define CAP_MAX_DISTANCE 0.0005

uniform float u_radius;
uniform vec2 u_rotationNoiseScale;
uniform vec3 u_kernel[KERNEL_SIZE];
uniform mat4 u_inverseProjectionMatrix;
uniform mat4 u_projectionMatrix;

varying vec2 v_texCoords;

vec4 getViewPos(vec2 texCoord)
{
	float x = texCoord.s * 2.0 - 1.0;
	float y = texCoord.t * 2.0 - 1.0;
	float z = texture(u_texture2, texCoord).r * 2.0 - 1.0;
	vec4 posProj = vec4(x, y, z, 1.0);
	vec4 posView = u_inverseProjectionMatrix * posProj;
	posView /= posView.w;
	return posView;
}

void main()
{
    float occlusion = 0.0;
    if(texture(u_texture2, v_texCoords).r != 1.0){
        vec4 posView = getViewPos(v_texCoords);
        vec3 normalView = normalize(texture(u_texture1, v_texCoords).xyz * 2.0 - 1.0);
        vec3 randomVector = normalize(texture(u_texture3, v_texCoords * u_rotationNoiseScale).xyz * 2.0 - 1.0);
        vec3 tangentView = normalize(randomVector - dot(randomVector, normalView) * normalView);
        vec3 bitangentView = cross(normalView, tangentView);
        mat3 kernelMatrix = mat3(tangentView, bitangentView, normalView);
        for (int i = 0; i < KERNEL_SIZE; i++)
        {
            vec3 sampleVectorView = kernelMatrix * u_kernel[i];
            vec4 samplePointView = posView + u_radius * vec4(sampleVectorView, 0.0);
            vec4 samplePointNDC = u_projectionMatrix * samplePointView;
            samplePointNDC /= samplePointNDC.w;
            vec2 samplePointTexCoord = samplePointNDC.xy * 0.5 + 0.5;
            float zSceneNDC = (texture(u_texture2, samplePointTexCoord).r) * 2.0 - 1.0;
            float delta = samplePointNDC.z - zSceneNDC;
            if (delta > CAP_MIN_DISTANCE && delta < CAP_MAX_DISTANCE)
            {
                occlusion += 1.0;
            }
        }
        occlusion = 1.0 - occlusion / (float(KERNEL_SIZE) - 1.0);
    } else occlusion = 1.0;

    gl_FragColor = vec4(occlusion, occlusion, occlusion, 1.0);
}

Your sample generation is indeed very odd. Why do you generate only vectors within the range ([0…1], [0…1], [0.5…1.5])?
And why do you weight/scale the vectors by i/1024?

As @theagentd suggested, you could use some sample pattern generators from JOML, such as “Best-Candidate” sampling, like so:


long seed = 12345L; // <- to seed the PRNG
int numSamples = 32; // <- number of samples to generate
int numCandidates = numSamples * 4; // <- increase this number to improve sample distribution quality
FloatBuffer fb = ByteBuffer.allocateDirect(numSamples * 3 * 4).order(ByteOrder.nativeOrder()).asFloatBuffer();
new BestCandidateSampling.Sphere(seed, numSamples, numCandidates, (x, y, z) -> fb.put(x).put(y).put(z));
fb.rewind();

Here is an image of what that typically produces:

I followed this example to a tee, I’ll be sure to fix my samples when I get home! Thank you so much for the help.

A couple of tips:

  • The code you have is using samples distributed over a half sphere. Your best bet is a modified version of best candidate sampling over a half sphere, which would require some modification of the JOML code to get.

  • I’d ditch the rotation texture if I were you. Just generate a random angle using this snippet that everyone is using, then use that angle to create a rotation matrix around the normal (You can check the JOML source code on how to generate such a rotation matrix that rotates around a vector). You can then premultiply the matrix you already have with this rotation matrix, keeping the code in the sample loop the exact same.

  • To avoid processing the background, enable the depth test, set depth func to GL_LESS and draw your fullscreen SSAO quad at depth = 1.0. It is MUCH more efficient to cull pixels with the depth test than an if-statement in the shader. With an if-statement, the fragment shader has to be run for every single pixel, and if just one pixel in a workgroup enters the if-statement the entire workgroup has to run it. By using the depth test, the GPU can avoid running the fragment shader completely for pixels that the test fails for, and patch together full workgroups from the pixels that do pass the depth test. This massively improves the culling performance.

  • You can use smoothstep() to get a smoother depth range test of each sample at a rather small cost.

  • It seems like you’re storing your normals in a GL_RGB8 texture, which means that you have to transform it from (0.0 - 1.0) to (-1.0 - +1.0). I recommend using a GL_RGB8_SNORM which can stores each value as a normalized signed byte, allowing you to write out the normal in the -1.0 to +1.0 range and sample it like that too. Not a huge deal of course, but gives you better precision and a little bit better performance.

How would this look? Do you have any info on producing half-sphere samples? Sorry I’m asking so many questions

Here’s what I’ve come up with:

mat3 rotationMatrix(vec3 axis, float angle)
{
    axis = normalize(axis);
    float s = sin(angle);
    float c = cos(angle);
    float oc = 1.0 - c;

    return mat3(oc * axis.x * axis.x + c,           oc * axis.x * axis.y - axis.z * s,  oc * axis.z * axis.x + axis.y * s,
                oc * axis.x * axis.y + axis.z * s,  oc * axis.y * axis.y + c,           oc * axis.y * axis.z - axis.x * s,
                oc * axis.z * axis.x - axis.y * s,  oc * axis.y * axis.z + axis.x * s,  oc * axis.z * axis.z + c);
}
float randomAngle = rand(v_texCoords);
mat3 rotationMat3 = rotationMatrix(normalView, randomAngle);

vec3 randomVector = vec3(0, 1, 0);
vec3 tangentView = normalize(randomVector - dot(randomVector, normalView) * normalView);
vec3 bitangentView = cross(normalView, tangentView);
mat3 kernelMatrix = mat3(tangentView, bitangentView, normalView);
kernelMatrix *= rotationMat3;

This produces… Unfavourable results ::slight_smile:
What am I supposed to do with the random vector…?

Other than that your instructions were clear enough for me to implement! I got the smoothstep, GL_RGB8_SNORM, and GL_LESS to work! Thanks!