I spent the majority of the day redesigning Fragile Soul’s menu:
[spoiler]
aOgWwiMn9XU
[/spoiler]
Meh. I think it looks great!
I couldn’t, that’s why I did it I think it’ll look great with the additional menu to quickly go to a section. The good thing about the current layout is that the graphics get a very large attention on the page, which is the only thing the game has going for it at the moment.
I don’t think we should put any more into this thread as it isn’t about web design. Once I change it down the line I’ll put in another “What I did today”.
Mike
EDIT: I added the navigation as suggested. As it was only a few minutes of work I thought it was silly not to do it
I found some nice optimizations for bounds testing in shaders. Basically my motion blur was blurring over the edge of the screen, resulting in either black color if I used texelFetch() which gave a dark aura around the screen when in motion, or being clamped to the edge pixel if I used texture() which inflated the weights of the edge pixels. Neither of these looked very good, so I decided to simply remove the samples that fell outside the screen. However, detecting these scenarios was expensive.
Simply using an if-statement to test whether the coordinates are inside the screen was dead slow as this was compiled to 1 branch per sample.
if(texCoords.x >= 0 && texCoords.y >= 0 && texCoords.x < resolution.x && texCoords.y < resolution.y){
samples += 1.0;
}
However, there’s a trick you can use. By casting the resulting boolean to a float, we can convert the boolean to 1.0 if it’s true and 0.0 if it’s false! Exactly what we want!
samples += float(texCoords.x >= 0 && texCoords.y >= 0 && texCoords.x < resolution.x && texCoords.y < resolution.y);
Sadly, this still compiles to the same thing as the if-statement. That’s weird since I know that GPUs have specific instructions to set the value of a register based on a simple comparison. As an example, this line:
float x = float(someValue < 1);
compiles to this instruction:
x: SETGT R0.x, 1.0f, PV0.x
It would seem that the boolean &&-operators are messing this up, causing it to revert to branches. Let’s try casting the result of each comparison to floats and then use a simple float multiply between them to effectively and them together.
samples += float(texCoords.x >= 0)*float(texCoords.y >= 0)*float(texCoords.x < resolution.x)*float(texCoords.y < resolution.y);
Bam! The comparison compiles to 2 SETGE (greater-equals) instructions, 2 SETGT (greater-than) instructions and 3 multiplies. I need to do this 16 times per pixel, once for each sample, so this saves a load of work! There is one final optimization we can make to improve this code on AMD’s vector GPUs. AMD’s older GPUs are a bit funny in that they run each shader on 4 or 5 different cores at the same time, trying to do as much as possible at the same time. This code:
float x = a + b + c + d;
would fit this extremely badly. GLSL requires the GPUs to enforce the order of the operations, so none of these instructions can be run in parallel. First we do a+b, then (a+b)+c, then finally ((a+b)+c)+d, which requires 3 cycles. If we add some “unnecessary” parenthesises, we can encourage vector GPUs to do these additions in parallel without affecting the performance of scalar GPUs that don’t have this problem:
float x = (a + b) + (c + d)
This only takes 2 cycles, as a+b and c+d can both be calculated in the first cycle, and then (a+b)+(c+d) can be calculated in the second cycle, making this chain of addition 50% faster. Doing this for the bounds testing gives this code:
samples += (float(texCoords.x >= 0)*float(texCoords.y >= 0))*(float(texCoords.x < resolution.x)*float(texCoords.y < resolution.y));
Theoretical performance of the 4 versions of bounds checking done for 16 samples on a Radeon HD 6870:
- 2.04 pixels per second
- 2.02 pixels per second
- 11.20 pixels per second
- 11.79 pixels per second
All in all, that’s a 5.78x improvement compared to a naive if-statement.
Nice.
There needs to be something like Souper but for shader code, although it would probably be more effective on pre-compiled shaders.
Manipulation of sign bits is handy on the CPU as well (both float and integers). Have I ever mentioned that GL compiling GLSL from source in the driver was a really terrible idea??
It’s not that weird, because this statement:
b = texCoords.x >= 0 && texCoords.y >= 0 && texCoords.x < resolution.x && texCoords.y < resolution.y
is roughly equal to:
init: if(texCoords.x >= 0)
if(texCoords.y >= 0)
if(texCoords.x < resolution.x) {
b = (texCoords.y < resolution.y);
goto end;
}
else
goto lab;
else
goto lab;
else
goto lab;
lab: b = false;
end:
The early-out (what’s the name…) of ‘&&’ causes a lot of branches.
Short-circuiting
And yeah, I’m curious how non-ss comparisons perform: [icode]samples += float(texCoords.x >= 0 & texCoords.y >= 0 & texCoords.x < resolution.x & texCoords.y < resolution.y);
[/icode]
Super optimizers are an interesting toy thing to do, but they’re insanely slow. Also this page’s results aren’t too encouraging. 31*i no result? Constant multipliers and divisors over integer -> addition chains.
I have seen this kind of pattern multiple times. Its hlsl but maps well to hardware.
if (dot(1.0, saturate(texCoords) - texCoords) != 0.0)
samples += 1.0;
I tried a similar version too:
samples += float(texCoords == clamp(texCoords, vec2(0), resolution));
8.30 pixels per second.
EDIT: If your coordinates are normalized and you use clamp(, 0.0, 1.0), the clamp becomes free, in which case this one is ALMOST as fast as the float multiplied version (version 4).
EDIT2: Turns out that clamp DIDN’T become free. Since the texture coordinate varying input is never modified, there is no previous instruction that clamp can piggyback on, so it needs to add a MOV instruction with CLAMP to get the same result. It’s faster than a MIN and a MAX, but still slower than the optimized float multiplied thingy.
I made a shader mod for Luftrausers. It is supposed to imitate old CRT arcade monitors, complete with warping and scan-lines.
I have been trying to make a shader like this for a whole year, and I finally made it when messing with modding. Now to find a game of my own to put it in :clue:…
2ZZl2GHT02A
Just spent the last 3 days working non-stop on xbox controllers in java for the robotics team - Works flawlessly.
I’m leaving to go to the competition in about 5 days.
Started on my 3D platformer using SilenceEngine.
Got gravity, lighting, and also controller input using Generic/XBOX controllers done. Still need to load levels from file, and also add a nice story around it.
Added automapping for the dungeons. I actually didn’t want to do that, but…i did it anyway… ;D

Added automapping for the dungeons. I actually didn’t want to do that, but…i did it anyway… ;D
This is starting looking like Skyrim mobile.
I coded a simple camera frustum to ground plane clipping algorithm. It does help a ton for calculating minimum shadow frusta.
Ground plane is calculated dynamically from minimal height of heightmap that intersect the current camera split per cascade.
Up to 2Millions triangle savings with improved effective resolution.
Dynamic, penumbra’d (yes that’s now a word), multisource shadows with blending. Oh, and some fake normal mapping. I also implemented map creation from lua files, like you see in the top left.
Based off of davedes article!
Edit: I also have the solving of a rubiks cube completely memorized and can do it all without looking at a reference
This is more of: ‘what I’m doing today’ or ‘what I’m doing at the moment’
But, I’m setting up Ubuntu on a virtual machine… I was extremely bored today so I thought… What the heck… I’ll see what all the fuss is over…