opengl MSAA resolve efficient ?

basil · August 14, 2014, 9:49pm

not really java related but i think people around here are smarter then me. maybe you guys can help me getting my head around something. it’s just a naive approach the topic to get a better understanding of it.

cannot find any useful information about “efficient deferred msaa resolve” on the googleinterwebs, but then i guess i just ask the wrong questions - or i’m just doing it all wrong.

i’m successfully rendering triangles into a FBO having multisampled textures attached to it. depth and color.

now, going into postprocessing … say, hdr-tone-mapping (or depth-darkening (or depth-of-field-blurring)), one would render a fullscreen-quad into a non-msaa-fbo (like default framebuffer) using a shader which access the multisample-textures generated in the first step. now since we know how many samples those textures contain we can do a simple process and resolve like that :

(for the sake of completeness - this is a trivial fragment-shader used to resolve a msaa-depth-buffer into a linear non-msaa buffer)

#version 150

out float frag0_r;     // writing to a single-chan float-texture
in  vec2 st0;

uniform sampler2DMS aa_tex;  // aa-depth-buffer
uniform vec2        dim;     // buffer-dimensions, could be replaced with textureSize(aa_tex,0)
uniform int         samples = 4;

uniform float znear;
uniform float zfar;
uniform float zrange;

float linear0(in float depth) { return (znear * zfar) / (zfar - depth * zrange); }    

void main()
{
  ivec2 coord = ivec2(dim * st0);
  float d     = 0.0;
  for(int i   = 0; i < samples; i++) d += linear0(texelFetch(aa_tex, coord, i).r);
  frag0_r = d / samples ;
}

but what i am talking about is actually just

for(int i = 0; i < samples; i++) sum += texelFetch(aa_tex, coord, i);

this could be also a color texture - the point is it is executed for every sample, even if all samples are the same. clearly, a bad idea. it works fine for simple tasks like the shader above (linear0()) but once we do heavy-per-sample-computation perfomance drops. not to talk about the amount of wasted precious gpu-power and memory at this point (… and generating another texture which tests all samples and emits a single value … which would be used to source another sample loop :-X).

when i understand the opengl-pipeline correctly then - during the triangle-rendering-pass, by default, not all samples are evaluated by the fragment-shader. only if the questionable fragment covers geometry partially (i know about GL_SAMPLE_SHADING and the pitfalls which force GL to run all samples). not evaluated samples are just copies.

leading me to https://www.opengl.org/sdk/docs/man/html/gl_SampleMaskIn.xhtml and the idea to use this information during the post-processing-resolve pass. if one would know how many distinct samples are in a texel - summing it up would be much more efficient.

again, maybe i’m just stuck at this path. ofc, you would just render all triangles again with a different shader, grabbing the sampleMaskIn data + gl_SampleID, grabbing the msaa-texel using gl_fragCoords and perform the resolve efficiently due to the default behaviour of “not executing all samples in the fragment”. but let us assume we dont have the triangles accessible during this pass anymore. (dont ask :persecutioncomplex:) … or, maybe just cos’ it’s nice to see how far one can get with “just using textures to assemble a high-quality image in reasonable time”.

personally i’m coming from offline-rendering, like cinema4D and such… so i like things to be visible … in textures/buffers … and layers. step by step, not too many different things at once. coding realtime graphics is just a hobby. here’s the idea generated in my favourite offline-renderer :

a simple plane with no AA (scaled 200%) :

same thing with 4x AA :

and the samples per fragment visualised (blue = 1 sample, green = 2, red = 4) :

is it possible to create something like the third image with basic openGL (GL4.5 :P) using msaa-textures and the sampleMask ?

is it possible to store the sampleMask in a non-msaa-texture and use it as a lookup later at all ?
i know it is not allowed to mix multisample and regular rendertargets. MRT requires all targets to be the same dimension and sample-configuration.

why doesn’t openGL just store the mask with the texture itself and makes my life easy ? :emo:

i know using the “usual” blitting is fast :

GL30.glBindFramebuffer(GL30.GL_READ_FRAMEBUFFER,source_id);
GL30.glBindFramebuffer(GL30.GL_DRAW_FRAMEBUFFER,target_id);
GL30.glBlitFramebuffer(0,0,width,height,0,0,width,height,GL11.GL_COLOR_BUFFER_BIT,GL11.GL_NEAREST);

but it does not allow per-sample-operations. (http://mynameismjp.wordpress.com/2012/10/24/msaa-overview/ (“working with HDR and tone mapping” shows nicely the bad effect of using postprocessing after resolve))

is MSAA the wrong approach ? is it worth going into ML/FXAA ? to me the pixel-quality of msaa is still the best, isn’t it ?

should I just reroll my rendering path ?

what am i missing ? :clue:
i was so happy when they gave me msaa-rendertargets, now i hate it.

i’m very sure i’m not the first person running into this issue. mind sharing your experience with it ? (sorry for my bad english, i’m not native, scheiße)