On behalf of my exams I would like to thank you, Khronos, and on behalf of myself I would like to say that you ruined my Christmas. Q_Q
In my opinion, I would be fine if it took yet another year for Khronos to make the conformance tests so damn tight and 100% coverage that you will be 100% sure that EVERY Vulkan-certified driver WILL work EXACTLY like every other such driver.
No more driver quirks, different unspecified or other annoying driver behaviour anymore, like with OpenGL today. Yes, I’m looking at you, Intel, for having crappy drivers and I am looking at you, Nvidia, for having far too lenient drivers and GLSL compiler.
Then we can finally have a cross-platform Direct3D.
http://blog.mecheye.net/2015/12/why-im-excited-for-vulkan/
I have to admit I misunderstood AMD’s intentions when they first announced Mantle. I was worried that they would aim for lock-in by “encouraging” vendors to only develop games using Mantle. As it turned out, AMD was actually the good guy here, and I’ve gained a lot of respect for them since Mantle was announced. They’ve done a HUGE favor to all gamers and developers by pretty much throwing Mantle on Khronos’ table and saying “go nuts with it”. They knew that the PC APIs were stuck and the only way to show the world what we’ve been missing was to do something drastic. Frankly, we have AMD to thank for a lot, including the biggest revolution in OpenGL’s development since, well, since it frigging came out in the 1990s, AND for finally bringing out the potential of PC gaming compared to consoles as well. The implications here are massive. With DX12 and Vulkan destroying the current-gen consoles when it comes to performance just 2-3 years after their release, we may hopefully be seeing the last generation of consoles and a migration to Steamboxes or PC-“consoles”, especially when you consider the bad hardware choices made for this generation (weak CPUs that will heavily limit the scale of future games). When you add AMD’s recent open-sourcing of some of their tools and APIs, they’re starting to look pretty damn good.
Continuing my console rant…
The CPU cores used in both the PS4 and the XB1 are both based on AMD’s Jaguar architecture. The CPU performance per core should be very similar if not identical to the performance of Athlon 5150 desktop CPU. These CPUs almost have identical clock speeds as well (Athlon: 1.6GHz, PS4: 1.6GHz, XB1: 1.75GHz). Looking up benchmarks comparing the quadcore Athlon 5150 with a modern Intel i5 or i7 shows that it’s around 1/4th to 1/5th as fast. However, the consoles both have 8 cores, not 4, but of these only 7 are actually usable by developers (with the 7th core being unlocked for developers very recently for both consoles). For the sake of argument, let’s be generous and say that the console CPUs are around half as fast as a modern i7. This means that the consoles were far behind high-end CPU parts when they were launched, around mid-end.
Up until now, consoles have had a huge advantage that helped them compete with PC. With just a single piece of hardware to support, you can go nuts with low-level optimizations and tailor-make entire APIs to fit the hardware, which the console makers have been doing vigorously since they were first made. My favourite example of this is how the last games developed for the PS3 actually offloaded many graphics effects like lighting to the CPU and did very fine culling of triangles on the CPU just to take a few percentages of load off the GPU. That would be impossible to do on PC without inducing massive stalls and lag with the current APIs. The ability to multithread command buffer generation and the much thinner bare-bones drivers with no API overhead gives the developers much more control and power to optimize their games. This advantage that has been exclusive to consoles since I was born is disappearing with the Vulkan and DX12, as can be seen in this Vulkan demo. In synthetic tests we’re talking performance increases up to 10x, while more realistic examples (especially of DX12) are talking about 2x improvements in CPU-limited games. This doesn’t account for the difference in RAM (for example, the PS4 uses GDDR5 for the CPU too, which can have serious performance drawbacks).
The reason why we won’t see a 10x improvement in actual games is that these games already utilize multiple cores for their logic, physics, etc. The increased performance comes from being able to utilize multiple cores for the traditionally single-threaded draw calls. If your game logic is taking 10 ms and your draw calls used to take 10ms, a 10x increase in API performance would mean an almost doubling of your FPS. This doesn’t mean that you’ll be able to draw 10x as many trees in your games, no, the GPU’s ability to draw triangles won’t get improve. If you could draw 1000 identical trees with OpenGL without being CPU bottlenecked, you can draw 1000 trees with Vulkan too. The difference is that with Vulkan each of those 1000 trees can be unique, as we can afford to do one draw call for each tree. Vulkan brings the power to add more variety and scale to our games, simply by alleviating a bottleneck that noone’s bothered to solve for years. The hardware world is very different from when OpenGL was designed, and this modernization won’t just improve the performance of the API a lot, it will also allow us to fully take advantage of all the CPU cores people’s computers have and scale our games to use any number of cores.
We’re even gonna get a few GPU optimization opportunities, the most interesting one being asynchronous computing. GPUs have a lot of fixed functionality hardware in them, and sometimes the thousands of shader units you have at your disposal are left idle while this hardware does its job. A great example is shadow map rendering. Since shadow maps are just depth images, we don’t actually need a pixel shader stage at all for them. The hardware rasterizer is what generates the fragments and their depth values, so the shaders are pretty much idle (apart from running vertex shaders). With Vulkan, you can actually fire off compute kernels in parallel with rendering commands (it’s a different command queue), meaning that you could render a shadow map while computing tile-based deferred lighting, possibly even double-buffering your shadow maps to keep both the rasterizer and the shader cores busy.
- Draw shadow map 1.
- Use shadow map 1 for compute shader lighting while drawing shadow map 2.
- Use shadow map 2 for compute shader lighting while drawing shadow map 1.
- Repeat 2-3.
You can even start doing post-processing early asynchronously while doing the lighting. It’s perfectly plausible to compute SSAO (requires only a depth and normal buffer, so can be done right after the G-buffer pass is done) while doing lighting. With all these opportunities, you may actually be able to cram out a few extra trees. =P Texture streaming will also be much easier to do with explicit control over DMA memory transfers to GPU memory memory. We won’t have to depend on driver hacks that allow DMA texture streaming to be done from a separate thread. Multi-GPU support can also be properly coded for in the first place, as you can actually query all GPUs connected to the computer, control them individually and manually handle synchronization of resources.
When you add the massively reduced driver complexity, having simple-to-compile SPIR-V as an intermediate shader format and the much more rigorous conformance tests that Khronos are developing, it’s clear that Vulkan will solve so many issues in the current OpenGL ecosystem. It really looks like we’ll be able to test our code on a single computer, and if it runs there’s a very very high chance that it will run on all of them.
and of course less confusion over what the fast path is