What I did today

theagentd · May 28, 2015, 7:50pm

Took another shot at anti-aliasing improvements today.

When using MSAA with deferred shading, it’s pretty much mandatory to use some kind of system that only computes lighting and complex shaders on pixels that require it. At 8x MSAA, we’re literally talking 8x the amount of lighting work, which would grind anything to a halt. Traditionally, this has been done by analyzing the depth and normals of the 8 samples of each pixel and determining if the pixel can be shaded once or if all 8 samples need to be shaded for correctness. This test leads to a stencil mask, and then lighting is done twice with two different shaders. The first pass only processes pixels that can be shaded once and the second one only processes pixels that need all 8 samples shaded. Branching in the shader to select per-pixel or per-sample shading generally leads to extremely bad performance and scheduling (one per-sample shaded pixel forces all neighboring pixels to run at per-sample resolution as well), and stencil test allows for much much better scheduling by the GPU. Newer compute shader based techniques like tile-based deferred shading work on a similar note, postponing all pixels that need per-sample shading to a second “pass” in the same compute shader.

My temporal SRAA implementation does not require per-sample shading; the scene is rendered as usual and the current and previous frames are used as input to the resolve shader. The resolve shader however is quite expensive. TSRAA works by matching an MSAA-resolution ID buffer to the shaded color samples of the previous and current frame to reconstruct an MSAA-resolution shaded image. 8x TSRAA essentially gives us 2x temporally supersampled shading with 0 risk of ghosting and 8x MSAA edge quality, while still only requiring a per-pixel shaded scene. The thing is that the upsampling process involves checking the neighboring shaded pixels as well in an attempt to find color data for each MSAA ID sample. This means checking the center pixel and the 4 closest pixels for both the current and previous frame, e.g. 8 ID samples are matched against 10 color samples. In addition, 10 transparency samples need to be taken and overlayed on top of each color sample, and finally there’s a motion vector being sampled for temporal reprojection to the previous frame. All in all, the shader does a total of 29 texture samples and a large amount of ALU operations. At 2560x1440 and 8x TSRAA, this resolve pass took ~4.2ms, over 1/4th of the entire frame budget.

Since the resolve pass is so expensive, I looked for ways of reducing the cost of it. I observed that probably 95% of all pixels are trivial to resolve. If all 8 ID samples are identical, we are pretty much guaranteed to have a color sample for all IDs in the center pixel of the current frame, and possibly in the previous frame as well, but this is not guaranteed. The center samples are prioritized, so if a center pixel match is found the neighboring pixels aren’t used. I decided to try to generate a stencil mask to find the pixels were all samples had identical IDs and process them with an optimized much faster shader. The optimized shader only samples the current and previous center pixels, the transparency data of those two pixels and the motion vector for reprojection, e.g. only 5 texture samples instead of 29. Ghosting is prevented by checking if the previous frame has the same ID as the current frame. A few minutes of playing with Shader Analyzer gives the following stats:

Full resolve shader: 705 instructions, 29 texture samples, 37 registers, 1.19 pixels/clock, slightly ALU limited
Fast resolve shader: 47 instructions, 5 texture samples, 5 registers, 9.6 pixels/clock, heavily texture limited

Red pixels are pixels that had the expensive resolve shader run. Imgur murdered the quality though… q.q

Now, for this scheme to be faster, the combined cost of running the stencil mark pass, the fast resolve pass and the full resolve pass needs to be lower than the cost of simply running the full resolve shader on the whole screen. Performance results woooh!

Before:

            engine.post.ResolveProcessor2 : 5.031ms
                Generate DMV buffer : 0.11ms
                Pack depth and motion vector lengths : 0.228ms
                Pack transparency : 0.69ms
                Full resolve pass : 3.998ms

After:

            engine.post.ResolveProcessor2 : 3.73ms
                Generate DMV buffer : 0.094ms
                Pack depth and motion vector lengths : 0.224ms
                Pack transparency : 0.695ms
                Stencil mark : 0.544ms
                Fast resolve pass : 0.689ms
                Full resolve pass : 1.477ms

The time of the resolve pass went from ~4.0 ms to a combined cost of ~2.7 ms. The performance gain depends on the scene of course, but this was a fairly “noisy” scene with lots of pixels that required the full resolve pass. In many scenes, the sky will take up a large part of the screen, and those pixels become extremely cheap. More often than not, the full resolve pass drops to under 1ms. I have yet to find a scene that actually became more expensive due to the overhead of the stencil marking etc, so I will go with it permanently.

Zeldar · May 28, 2015, 8:08pm

Finished a spell in my game, it took like 10 mins of coding and 5 hours of art work including sound but it was worth it ;D
uvQJRJ8f52Q

ags1 · May 28, 2015, 8:45pm

I realized that I love messing about with graphics code, but the progress is very slow and it stops me getting anywhere. So I have gone into antigraphics mode, and am using good old Java 2D now. I am trying to be as ungraphical as possible for the next 6 months.

Opiop · May 28, 2015, 9:08pm

At work this morning I took an hour and taught myself the basics of ASP.NET MVC with C# and Bootstrap, and then I spend the remainder of my day throwing up the UI for my web application. C# is such a fun language to work with, I actually like it as much, or maybe even more, as I like Java. And ASP.NET MVC is an amazing tool to quickly design and implement a front end application. Unfortunately, my boss is requiring me to code the backend in Java still which complicates things. IMO, it would be far easier just to do the entire backend in C# and not mix and match languages. But whatever boss wants, he gets so long as I get my paycheck and comfy chair

thedanisaur · May 29, 2015, 3:53pm

I got a job a couple weeks ago and decided to go back to school so progress has been slow, but this morning I “finished” animation blending. It started with blending the current frame with the next frame in the animation so when it plays at a slower speed I still get a smooth animation and when faster it doesn’t skip frames. Then I used the same process to blend two animations together. In the future I will be able to blend multiple animations together with out any problems, but for now I think two is fine.

I’m now running into an issue that when the animation loops it slows down for a few frames and then speeds back up. Not sure if it’s because I’m bad at animating or because I have a problem in my code.

Herjan · May 29, 2015, 9:16pm

Played around with Perlin Noise for the first time to generate ‘sky’. (Yes the landscape is a bunch of Math.randoms)
The following images may look better when you click on them.

Two types of space:

http://i.imgur.com/Ap57Tqb.png

No doubts, this one looks craz… very abstract

http://i.imgur.com/mFPN0u1.png

Yes this one has sharp contrast with the landscape

Earthplay:

http://i.imgur.com/RwzUweB.png

http://i.imgur.com/eKGeyXI.png

nsigma · May 30, 2015, 3:35pm

Well, today a boring and hot journey back from London …

Yesterday OTOH, got to talk about Praxis LIVE (website / forum thread) at NetBeans Day UK. ;D

Pleased that people were impressed by things like live code editing / injection of Java & GLSL code.
Bemused that people were impressed by 60fps OpenGL - it’s 2015! :emo: - do we all need to get out and shout about what we do more?
Grateful how much this community has helped me become someone who can talk to a bunch of top-notch Java developers! 8)

teletubo · May 31, 2015, 4:33am

I probably missed a part of the conversation. but what exactly is the problem with those ads? it looks legit. The only not legit thing about it maybe is the girl’s butt. And maybe some people here do want to get a girlfriend … so it’s a well targeted ad.

Gibbo3771 · May 31, 2015, 7:08am

Mostly to do with this being a family friendly website, face it, click that ad and see it’s just a website for swingers . You don’t see these ads on other forms.

Riven · May 31, 2015, 9:25am

I only allow ‘safe for children’ ads on JGO, and these were a bit inappropriate for that category.

Anyway, the AdSense ads have been pulled, so all is fine for now. There are ‘misrated’ ads on ProjectWonderful too, but these I can report/pull individually - so keep me informed.

SHC · May 31, 2015, 9:39am

Tried rendering the floors of Hell.

400 blocks rendering at ~640 FPS without any sort of culling. Should I worry about performance now?

ra4king · May 31, 2015, 10:32am

Only 400? Try a few hundred thousand ;D

SHC · May 31, 2015, 10:38am

Tried that. 8000 cubes are rendering at ~100 FPS.

death_angel · May 31, 2015, 10:45am

So this is how my school project looks like:

https://pbs.twimg.com/media/CGReA7RWQAALXkW.png

https://pbs.twimg.com/media/CGRd_vCXEAAztqu.png

Today is time to start porting it to android

chrislo27 · May 31, 2015, 5:05pm

I skimmed through Microsoft’s documentation on C# for Java developers. It didn’t look too bad, some good things were that it had signed and unsigned numbers (as primitives rather than using a method to do addition for “unsigned” numeric values), operator overloading, indexers, blah blah blah… Some things that will take way too long to get used to are the stupid UpperCamelCaseEverythingIncludingMethodsAndVariables.

That might nearly deter me from actually venturing into C#. Oh well…

BurntPizza · May 31, 2015, 5:12pm

I’d say a language is doing pretty well if the biggest problem you have with it is syntax convention.

SauronWatchesYou · May 31, 2015, 5:21pm

This will come in handy ;D

Opiop · May 31, 2015, 5:51pm

On Friday at work I managed to connect the front end and the backend of my web application together, and I switched from using a SOAP based web service to REST with Spring Boot. Saved my life. I had my web service back up and running in under half an hour, whereas the SOAP version took me hours to even get a functioning version. Also switched over to a Gradle build system which makes dependencies and building a WAR file for my Tomcat server ridiculously easy. The great thing about Spring Boot is I have the option to create a runnable jar with a Tomcat server embedded in it, so I can just run the jar from the command line and boom, web service up and running OR I can build a WAR file and throw it into my external Tomcat instance. Having the runnable jar makes debugging very easy though, because I don’t have to remember to restart my Tomcat server every time I need to test my new build.

I have practically finished the layout of the front end, although I want to simplify the interface a little more. Right before I left work on Friday I also managed to connect to our SQL database and display a record on my website! Really cool stuff, I was happy to finally be able to connect to the database. I’m excited to go into work on Monday!

MrPizzaCake · May 31, 2015, 6:53pm

Welp, after a few month break, I’m back in the programming… thing. Woohoo!

theagentd · June 1, 2015, 2:07am

Did even more work on my anti-aliasing resolve system.

I started out with a simple brute-force algorithm. Doing everything in a single shader pass lead to a pretty slow shader as the shader ended up being over 1000 instructions long and over 50 texture samples read from 10 different textures. When profiling, the whole thing simply looked like this:

[quote] Resolve pass : 3.403ms
[/quote]
3.4 ms, and that’s only at 1920x1080. Quite expensive. To improve the quality of transparent geometry being overlaid on edges, I precomputed a transparency buffer for both the current and previous frame to be overlaid on top of them. At the same time, I could move out motion blur, the second biggest part of the shader, to the precomputation pass. Not only did this improve quality a lot, it also improved performance as the shader got split up into two parts, and the precomputation pass also saved some work in the resolve shader.

[quote] Pack transparency : 0.417ms
Full resolve pass : 2.406ms
[/quote]
That shaved off 0.6 ms and we end up at around 2.8 ms. Still a bit high. As I wrote in my last post, I realized that most pixels were trivial to resolve, and by computing a stencil mask I could improve my code to only compute the expensive resolving for the pixels that needed it. The rest used a simpler much faster shader that gave identical result for those trivial cases.

[quote] Pack transparency : 0.412ms
Fast resolve pass : 0.362ms
Stencil mark : 0.321ms
Full resolve pass : 0.783ms
[/quote]
That got rid of a lot of work! We’re down another 0.9 ms, to 1.9 ms.

My new system precomputed the transparent geometry to overlay on each current and previous frame pixel, but this had the problem that since the resolve shader read a total of 10 pixels, I ended up with 10 color samples and 10 transparency samples, and also extra instructions to blend the transparent geometry with the color samples. I realized that it was actually possible to do the reprojection and blending completely during the precomputation, which meant that I could get rid of the 10 transparency reads from full resolve shader as well as a handful of ALU instructions. Although this increased performance a bit, the shader actually just became extremely ALU limited instead, so the performance increase wasn’t as big as I had hoped. In addition, I managed to do some minor optimizations in the full resolve pass which saved a few ALU instructions. The precomputation also helped the fast resolve pass a bit as well, as it now only has to do 3 texture samples and a small handful of instructions, so it’s actually ROP (fillrate) limited now due to how cheap it is.

[quote] Pack transparency : 0.515ms
Fast resolve pass : 0.254ms
Stencil mark : 0.322ms
Full resolve pass : 0.492ms
[/quote]
The wrongly named transparency packing (e.g. the complete precomputation pass, to be renamed) went up by 0.1 ms, but the fast resolve pass shaved off 0.1 ms too. The full resolve pass also got quite a bit faster, losing almost 0.3 ms too. All in all, we’re down to just under 1.6 ms, a pretty amazing result considering the same process took 3.4 ms a few days ago.