Optimizing FBO's

steveyg90 · May 17, 2016, 3:05pm

Hi,

After using two FBO’s, my frame rate has dropped from the usual vsynch’d 60fps to 30fps. This is the code:


		blurTargetA.bind();
		Gdx.gl.glClear(GL20.GL_COLOR_BUFFER_BIT);
		batch.setShader(null);

		batch.begin();

		worldMap.drawMap(debug, batch, regions, camX, camY, endX , endY, WORLDWIDTH, WORLDHEIGHT);
				
		batch.flush();
	
		batch.setShader(blurShader);
		// pass 1 - horizontal blur
		blurShader.setUniformf("dir", 1f, 0f);
		blurShader.setUniformf("radius", 1 * MAX_BLUR);

		blurTargetB.bind();

		Gdx.gl.glClear(GL20.GL_COLOR_BUFFER_BIT);

		fboRegion.setTexture(blurTargetA.getColorBufferTexture());
		
		batch.setColor(1,1,1,1);  // VIP! 

		batch.draw(fboRegion, 0, 0, SCREENWIDTH, SCREENHEIGHT);
		batch.flush();
		FrameBuffer.unbind();

		blurShader.setUniformf("dir", 0f, 1f);
		blurShader.setUniformf("radius", 1 * MAX_BLUR);
		fboRegion.setTexture(blurTargetB.getColorBufferTexture());

		batch.setProjectionMatrix(camera.combined);  // bind to main camera projection - VIP!!!
		
		batch.draw(fboRegion, (camX*16),(camY*16));//, 320,480);draw to position x400,y400
			
		batch.setShader(null);

FBO’s are created in my create method:


			blurTargetA = new FrameBuffer(Format.RGBA8888, SCREENWIDTH, SCREENHEIGHT, false);
			blurTargetB = new FrameBuffer(Format.RGBA8888, SCREENWIDTH, SCREENHEIGHT,  false);
			fboRegion = new TextureRegion(blurTargetA.getColorBufferTexture());
			
			fboRegion.flip(false, true);

Is there any optimizations that could be made to the above although I think it looks ok, if I set the shader to null instead of blurshader, I get the 60fps back, so looking more likely the shader is the cause of it:


varying vec4 vColor;
varying vec2 vTexCoord;

//declare uniforms
uniform sampler2D u_texture;
uniform float resolution;
uniform float radius;
uniform vec2 dir;

void main() {
    //this will be our RGBA sum
    vec4 sum = vec4(0.0);

    //our original texcoord for this fragment
    vec2 tc = vTexCoord;

    //the amount to blur, i.e. how far off center to sample from 
    //1.0 -> blur by one pixel
    //2.0 -> blur by two pixels, etc.
    float blur = radius/resolution; 

    //the direction of our blur
    //(1.0, 0.0) -> x-axis blur
    //(0.0, 1.0) -> y-axis blur
    float hstep = dir.x;
    float vstep = dir.y;

    //apply blurring, using a 9-tap filter with predefined gaussian weights

    sum += texture2D(u_texture, vec2(tc.x - 4.0*blur*hstep, tc.y - 4.0*blur*vstep)) * 0.0162162162;
    sum += texture2D(u_texture, vec2(tc.x - 3.0*blur*hstep, tc.y - 3.0*blur*vstep)) * 0.0540540541;
    sum += texture2D(u_texture, vec2(tc.x - 2.0*blur*hstep, tc.y - 2.0*blur*vstep)) * 0.1216216216;
    sum += texture2D(u_texture, vec2(tc.x - 1.0*blur*hstep, tc.y - 1.0*blur*vstep)) * 0.1945945946;

    sum += texture2D(u_texture, vec2(tc.x, tc.y)) * 0.2270270270;

    sum += texture2D(u_texture, vec2(tc.x + 1.0*blur*hstep, tc.y + 1.0*blur*vstep)) * 0.1945945946;
    sum += texture2D(u_texture, vec2(tc.x + 2.0*blur*hstep, tc.y + 2.0*blur*vstep)) * 0.1216216216;
    sum += texture2D(u_texture, vec2(tc.x + 3.0*blur*hstep, tc.y + 3.0*blur*vstep)) * 0.0540540541;
    sum += texture2D(u_texture, vec2(tc.x + 4.0*blur*hstep, tc.y + 4.0*blur*vstep)) * 0.0162162162;

    //discard alpha for our simple demo, multiply by vertex color and return
    //gl_FragColor = vColor * vec4(sum.rgb, 1);
    vec4 texColor = texture2D(u_texture, vTexCoord);
    
    gl_FragColor = vColor * vec4(sum.rgb, texColor.a);
}

Appreciate any advice. Maybe sampling to half screen width and height? Possibly pass texture chords into vertex shader so no real computing in frag shader…

Thanks

PS - the 30fps is on my laptop which has a crap intel hd4000 GPU.

elect · May 17, 2016, 4:51pm

Most of important calls are hidden in the first quote.

A couple of tiny things (don’t expect performances rise more than a couple %) you may do is:

avoid querying uniform location
if you have a vec2 dir and a float radius to upload you can use a single vec3
avoid resource unbinding in general

Ps: “varying” is deprecated. Use “in”/“out” instead, interpolation is the default option.

steveyg90 · May 17, 2016, 7:23pm

Thanks for this, will give it a try.

Going to also try a faster blur based on passing more vertex info to vertex shader which means won’t have to do calculation in the frag shader.
Also maybe try using smaller fbo, at the moment fbo is 1920x1080 which is a lot!

On my gaming PC I get 700 fps

steveyg90 · May 17, 2016, 9:08pm

I found that remove a box2d cone light I had in the game made the frame rate go back up to 60fps?! Wtf?!

thedanisaur · May 18, 2016, 4:45pm

The best way to optimize is to know exactly what is taking up the most time. There are a number of profilers out there. VisualVM is the one I use, I recommend using one to narrow down the problem instead of just guessing.

Also, using fps as a benchmarking “tool” is poor at best. I suggest finding out how many milliseconds each frame takes.

CoDi_R · May 19, 2016, 7:42am

steveyg90: There’s some room for improvement with your shader code (e.g. it does two texture lookups at vTexCoord), but you should try to reduce the workload first. Using a smaller fbo is a great idea. If this does not work well with your blur filter, try a different filter.

I found this guy’s videos on YT pretty well made - in this one he’s talking about Gaussian blur and downsampling: https://www.youtube.com/watch?v=uZlwbWqQKpc

Another idea: Try to use a separate sprite batch for your screen space effects. The idea is to avoid the GPU driver doing guesswork how/when to reallocate buffer memory or even wait/block until its flushed and available again. It depends on the driver quality how much this helps though, and may be difficult to measure. I had a nice little perf boost myself when I switched to a small, static vertex buffer with only four vertices for that purpose (doesn’t use SpriteBatch, doesn’t need to rebuild/reupload to GPU every frame).

nsigma · May 19, 2016, 10:04am

And most blur filters will blur the alpha channel too anyway, so why not return vColor * sum?

steveyg90 · May 19, 2016, 5:55pm

Thanks all,

I’m looking at doing some of the calculations in the vertex shader thus reducing the calculations in the frag shader.

Did reduce FBO, worked on my gaming PC fine (NVidia 960 GTX), but on my laptop which has only intel hd 4000 it didn’t, the screen just showed smaller in
bottom left corner.

CoDi_R · May 19, 2016, 10:37pm

This is usually caused by a GL state you forgot to set. In my experience the NVidia drivers are a lot more forgiving, or rather try to default to reasonable values, while the Intel driver just gives you the finger and shows a black/wrong/corrupt screen.

steveyg90 · May 20, 2016, 7:03am

Thanks for the information, will do some reading up.

CoDi_R · May 20, 2016, 7:39am

My first guess would be to look at your viewport and/or projection matrix settings. For example your code uses FrameBuffer.bind() which doesn’t call glViewport(), which is fine as long as its the same size as the backbuffer. Try to call FrameBuffer.begin() instead.