TL;DR: http://screenshotcomparison.com/comparison/95587/picture:0
Implemented a new anti-aliasing technique into We Shall Wake. It’s based on SRAA, Subpixel Reconstruction Anti-Aliasing, but I’ve extended it with a temporal component. As far as I know, no one’s ever done this before, but the result was really surprisingly good. For the people who don’t care about the implementation, there are pretty pictures at the end. =3
Some relevant background
FXAA (Fast approXimate Anti-Aliasing) is a really cheap anti aliasing filter that has become quite popular lately. For its super-low performance cost and easy integration into existing game engines, it does a decent job, but it has some very heavy limitations. FXAA works by analyzing colors on the screen in an attempt to find sharp jagged edges and then blends together pixels in an attempt to smooth them out. It’s implementation is actually extremely clever and it requires a very low number of instructions and texture samples to do its job, hence it’s really fast. Although it can eliminate some of the most glaring artifacts of rasterized graphics (staircase jaggies), it is still limited by the information in the colors of the frame. A triangle that is too thin to be rasterized will not be rendered in the first place, and FXAA cannot do anything to reconstruct that information.
Temporal supersampling (TSSAA) is a technique which exploits the temporal coherence of the screen. By assuming that each rasterized frame won’t change very much from the previous frame, it makes sense to combine the previous and current frame to achieve a better rasterized approximation of the geometry at hand. By rendering every other frame with a sub-pixel offset, we can often double the coverage information we have to work with. The quality is in optimal cases equal to 2x spatial supersampling. Obviously, this technique has some glaring drawbacks. The scene is not static between frames, so motion between frames introduce ghosting. There are techniques that can reduce this, but no technique that can eliminate it in all cases. The technique is therefore quite unpopular, despite its potentially massive gains for almost no performance cost at all.
MultiSample Anti-Aliasing (MSAA) is the de facto standard of anti-aliasing. MSAA involves computing multiple coverage samples for each pixel in hardware, but only running the fragment shader once for each pixel. These samples can then be very efficiently written to RAM using compression to save bandwidth. The great thing about MSAA is that it actually provides a large amount of extra coverage information. 4x MSAA provides four times as much information about the triangles we rasterized, so we can much better approximate their shape. Sadly, MSAA has a number of large drawbacks. It requires a large amount of extra video memory, and often carries a large performance hit. It’s not easy to use with deferred lighting, carrying an even worse performance hit, and for indie developers it’s also extremely difficult and time-consuming to implement, often requiring a completely different rendering pipeline to work.
SRAA (Subpixel Reconstruction Anti-Aliasing) builds on the idea of FXAA and attempts to improve on it. SRAA still relies on blending pixels together to reduce aliasing, but the way it decides how to blend is completely different. SRAA involves rendering the scene twice. First the scene is rendered and shaded as usual (just like for FXAA). In the second pass, the scene is rendered again to an MSAA render target to produce extra coverage information. In final resolve pass, for each coverage sample we look through the available color samples and pick the best possible candidate, effectively upsampling our non-MSAA color buffer to an MSAA color buffer. Although it is still limited to the same color information as FXAA, the way it detects edges is identical to MSAA. This means that it can’t handle sub-pixel geometry in the scene, but it CAN handle subpixel motion. This is a massive improvement over FXAA for scenes with any movement, but one of the most glaring drawback of FXAA is still there.
TSRAA
Temporal SRAA extends SRAA with a temporal component. The problem with SRAA is that it is limited by the limited amount of color information available in the non-MSAA color buffer. Temporal supersampling can double the amount of information we have access to essentially for free. The advantage of combining SRAA with temporal supersampling is that with SRAA we already have a way of identifying which color sample to fetch for a given coverage sample, so we can avoid ghosting by only sampling the previous frame if we are sure that we’re sampling the exact same triangle as the one we have in the current frame. There are still cases were minor ghosting can occur, but together with the standard ghosting reduction techniques, this can be reduced at unnoticeable levels.
Screenshots
To ease comparison of the “subtle” effects of anti-aliasing, I have uploaded them to ScreenshotComparison. These scenes have been rendered at 1/4 resolution to better show the effect of the anti-aliasing. Note that this makes TSRAA work less good, as this effectively makes all triangles cover much fewer pixels, making it more difficult to reconstruct the coverage of the scene.
NOTE There are 3 different screenshot comparisons in there!!! http://screenshotcomparison.com/comparison/95587/picture:0
Performance
Performance of this technique is excellent, as it does not require additional shading compared to no anti-aliasing. The only additional work required is the second pass (which of course can be prohibitive if the cost of processing each vertex twice is high) and the resolving pass, which I’ve managed to optimize quite a bit compared to the reference implementation.
[tr][td]No anti-aliasing[/td][td]148 FPS[/td][td]6.76 ms[/td][/tr]
[tr][td]FXAA[/td][td]139 FPS[/td][td]7.19 ms[/td][/tr]
[tr][td]4x SRAA[/td][td]124 FPS[/td][td]8.06 ms[/td][/tr]
[tr][td]4x TSRAA[/td][td]120 FPS[/td][td]8.33 ms[/td][/tr]
[tr][td]8x TSRAA[/td][td]105 FPS[/td][td]9.52 ms[/td][/tr]
We get a cost of around 1.57 ms for 4x TSRAA 1920x1080p. A very unprofessional comparison reveals that Battlefield 4, which implements deferred MSAA, runs at 73 FPS (13.70 ms) without MSAA and 59 FPS (16.95 ms)with 4x MSAA enabled, meaning that 4x MSAA has a cost of 3.25 ms in that game, more than twice what my technique uses.
Memory usage is also very low since the G-buffer does not require MSAA. Here’s a comparison table of how much memory different techniques use/would use if I implemented them.
[tr][td]Technique[/td][td]G-buffer memory usage[/td][td]Additional memory[/td][td]Total at 1920x1080[/td][/tr]
[tr][td]No anti-aliasing/FXAA[/td][td]26 * resolution[/td][td]None[/td][td]51.4 MB[/td][/tr]
[tr][td]4x MSAA[/td][td]104 * resolution[/td][td]4 * resolution (for resolving)[/td][td]213.6 MB[/td][/tr]
[tr][td]8x MSAA[/td][td]208 * resolution[/td][td]4 * resolution (for resolving)[/td][td]419.2 MB[/td][/tr]
[tr][td]4x SRAA[/td][td]26 * resolution[/td][td]24 * resolution[/td][td]98.9 MB[/td][/tr]
[tr][td]4x TSRAA[/td][td]26 * resolution[/td][td]32 * resolution[/td][td]114.7 MB[/td][/tr]
[tr][td]8x TSRAA[/td][td]26 * resolution[/td][td]56 * resolution[/td][td]162.2 MB[/td][/tr]
4x TSRAA uses around half the memory usage of 4x MSAA. Results are even better for 8x TSRAA and 8x MSAA.
Quality analysis
The important part here is how the techniques handle sub-pixel details and sub-pixel motion. SRAA itself cannot handle sub-pixel details, but temporal supersampling can. As long as triangles are thicker than 0.5 pixels, TSRAA will accurately reconstruct them. As long as this criteria is filled, TSRAA can then represent coverage at the given sample rate. When it comes to sub-pixel motion of triangle edges, 4x TSRAA has a precision equal to 4x MSAA.
Further work
- More anti ghosting measures.
- Tone mapping of each coverage sample before resolving instead of afterwards.