A rant on OpenGL's future

theagentd · July 15, 2014, 4:10am

Some background

TL;DR: OpenGL has momentum. The number of games utilizing it is steadily increasing, which has forced GPU vendors to fix their broken drivers. Compared to 2 years ago, the PC OpenGL ecosystem is looking much better.

Almost everyone on this forum have at at least one point in their life worked with OpenGL, either directly or indirectly. Some of us likes to go deep and use OpenGL directly through LWJGL or LibGDX, while others (a probable majority) prefer the increased productivity of abstractions of OpenGL like LibGDX’s 2D support, JMonkeyEngine or even Java2D with OpenGL acceleration. In essence, if you’re writing anything more advanced than console games in Java, you’re touching OpenGL in one way or another. Outside of Java, OpenGL is used mainly on mobile phones as both Android and IOS supports it, but a number of AAA PC games have recently used OpenGL.

The id Tech 5 engine is based solely on OpenGL and is used for RAGE and the new Wolfenstein game, with two more games in development using the engine.
Blizzard has long since supported OpenGL as an alternative to DirectX in all their games to allow Mac and Linux users to play their games.
Valve is pushing a move to OpenGL. They’re also porting the Source engine to OpenGL, and some of the latest Source games (Dota 2 for example) default to OpenGL instead of Windows.

These are relative recent events that have essentially started a chain reaction of improvements in OpenGL support throughout the gaming industry. The push by developers to support or even move to OpenGL has had a huge impact on OpenGL driver quality and how fast new extensions are implemented by the 3 graphics vendors. Some of you may remember the months following the release of RAGE when people with AMD cards had a large number of issues with the game, and RAGE was not exactly using cutting edge features of OpenGL. During the development of We Shall Wake I’ve had the pleasure of encountering a large number of driver bugs, but I’ve also had a very interesting perspective on how the environment has changed.

Nvidia’s OpenGL drivers have always been of the highest quality among the three vendors, so there was never much to complain about here. My only complaint here is that they are impossible to report driver bugs to, as they never respond to anything. This is a bit annoying since when you actually do find a bug, it’s almost impossible to get your voice heard (at least as a non-professional graphics programmer working on a small project).
AMD’s drivers are significantly better today compared to a year or two ago, and almost all the latest important OpenGL extensions are supported. Their biggest problem is that they lag behind slightly with OpenGL extensions, leading to some pretty hilarious situations like AMD holding a presentation for optimized OpenGL rendering techniques that are only supported by their competitors.
Even more impressive are Intel’s advances. Around a year ago I had problems with features that dated back to OpenGL 1.3. Their GLSL compiler was a monster which had bugs that should’ve been discovered within hours of release. Even worse, they were very quick to discontinue driver development as soon as a new integrated GPU line was released. Today, they have a respectable OpenGL 4 driver compatible with all their OGL4 capable integrated GPUs, and they also support a majority of the new important extensions. Intel also takes the prize for best developer support, as I have reported 3 bugs which have all been fixed in the following new driver release.
OpenGL on OSX has also gotten a big improvement lately. The latest drivers support OpenGL 4.1 on all GPUs that have the required hardware, but most cutting-edge features are still missing.

What’s wrong with OpenGL?

TL;DR: OpenGL is horribly bloated. For the sake of simpler, less buggy and faster drivers that can be developed more quickly, they need to get rid of unnecessary functionality.

We’ll start by taking a look at OpenGL’s past. The main competitor of OpenGL, DirectX, has traditionally had (and still has) a large market share, and not without good reasons. A big difference between the two is how the handle legacy functions. DirectX does not have a significant amount of backwards compatibility. The most important transition happened between DirectX 9 and 10. They completely remade the API from the ground up to better fit the new generation of GPUs with unified architectures that were emerging. This was obviously a pain in the ass for developers, and many games are still being developed with DirectX 9. It was however a very good choice. Why? Because the alternative was what OpenGL did. First of all, OpenGL 3 (the functional equivalent of DirectX 10) was delayed for several months, allowing DirectX to gain even more of a head start. Secondly, they decided to instead of starting from scratch, they decided to deprecate old functionality and eventually remove it in later versions. Sounds like a much easier transition for developers, right? Here’s the catch: They also provided specifications for a compatibility mode. Since all 3 vendors felt that they were obliged to support this compatibility mode, they could not actually get rid of the deprecated functionality. In essence, OpenGL 3 is OpenGL 2 with the new functionality nastily nailed to it. The horror story continued with OpenGL 4 and the new revolutionary extensions that will make up the future OpenGL 5. OpenGL is so ridiculously bloated with functionality that haven’t existed in hardware for over 10 years. Nvidia, AMD and Intel are still emulating completely worthless functionality with hidden shaders, like fixed functionality multitexturing and the built-in OpenGL lighting system. Implementing and maintaining these functions for every new GPU that they release is a huge waste of resources for the three vendors. This is one of the sources of the traditionally bad driver support OpenGL has had. It was simply not worth it until more games using OpenGL started popping up. A fun fact is that Apple actually decided not to support the compatibility mode, so to access OGL3+ on OSX you need to specifically request a context without compatibility mode.

Sadly, the existence of the compatibility mode has encouraged many important industrial customers of the graphics cards vendors (CAD programs and other 3D programs) to become dependent on the compatibility mode, so it’s essentially here to stay. So we have 3 vendors, each with their own ridiculously massive unmaintainable driver, and we just get more and more functionality. We’re seeing a similar shift in how things are done in hardware as we did between DirectX 9 and 10 right now between DirectX 11 and 12. Interestingly, OpenGL is leading here thanks to extensions that expose these new hardware features. These are available right now on all vendors with the latest beta drivers, except on Intel which is lacking a few of them. In essence, we already have the most important features of a theoretical OpenGL 5.

Here’s what’s wrong with OpenGL at the moment. There are too many ways of doing the same thing. Let’s say you want to render a triangle. Here’s the different ways you can upload the exact same vertex data to OpenGL with.

Fixed functionality:

Immediate mode with glBegin()-glEnd(). Generally slow, but easy to use. (1992)
Vertex arrays with data reuploaded each frame. Faster but still slow as data is reuploaded each frame. (1997)
Create a display list. Fastest on Nvidia hardware for static data. (1997)
Upload to VBO with glBufferData() each frame. Generally stable performance, but slow due to additional copies and complicated memory management in the driver. (2003)
Allocate VBO once, upload to VBO with glSubBufferData() each frame. Slow if you’re modifying the same buffer multiple times per frame. Also requires copying of data. (2003)
Map a VBO using glMapBuffer() and write to the mapped memory. Avoids an extra copy of the data, but forces synchronizations between the GPU, driver thread and game thread. (2003)
Map a VBO using glMapBufferRange() with GL_MAP_UNSYNCHRONIZED_BIT and handle synchronization yourself. Avoids extra copy and synchronization with the GPU, but still causes synchronization between the driver thread and the game thread. (2008)
Allocate persistent coherent buffer, map once and handle synchronization yourself. No extra copy, no synchronization. Allows for multithreading. (2013)

So we literally have 8 different ways of uploading vertex data to the GPU, and the performance of these methods depend on GPU vendor and driver version. It took me years to learn which ones to use for what kind of data, and which ones are fast on which hardware, and which you should avoid in what cases. Today, all but the last one are completely redundant. They simply complicate the driver, introduce more bugs in the features that matter and increases development time for new drivers. We literally have code from 1992 (the year I was born, I may add) lying right next to the most cutting edge method of uploading data to OpenGL from multiple threads while avoiding unnecessary copies and synchronization. It’s ridiculous. The same goes for draw commands. The non-deprecated draw commands currently in OpenGL 4.4 (+ extensions):

glDrawArrays
glDrawArraysInstanced
glDrawArraysInstancedBaseInstance
glDrawArraysIndirect
glMultiDrawArrays
glMultiDrawArraysIndirect
glDrawElements
glDrawRangeElements
glDrawElementsBaseVertex
glDrawRangeElementsBaseVertex
glDrawElementsInstanced
glDrawElementsInstancedBaseVertex
glDrawElementsInstancedBaseVertexBaseInstance
glDrawElementsIndirect
glMultiDrawElementsBaseVertex
glMultiDrawElementsIndirect
glMultiDrawArraysIndirectBindless <----
glMultiDrawElementsIndirectBindless <----

Only the last two functions marked with arrows are necessary to do everything the above commands do. This bloat needs to go. Oh, and here’s another funny piece of information. GPUs don’t actually have texture units the way OpenGL exposes them anymore. We can immediately deprecate texture units as well and move over to bindless textures any time we want. Imagine the resources the driver developers could spend on optimizing functionality that is actually useful instead of on maintaining legacy functions, not to mention the smaller number of bugs as there are less functions that can have bugs in the first place.

Why fix what isn’t broken?

Competition. Mantle is a newcomer in the API war but has already gained developer support for many released and upcoming games thanks to its more modern, simpler API that is a better fit for the GPU hardware of today (well, and probably a large amount of money from AMD). DirectX 12 will essentially be a cross vendor clone of Mantle. Yes, OpenGL is ahead of DirectX by far right now thanks to extensions, but that won’t last forever unless they can keep the API as simple and straightforward to use, fast and bug free as the competition. We’re still behind Mantle when it comes to both functionality and simplicity. OpenGL is too complicated, too bug ridden and too vendor dependent. Unless OpenGL 5 wipes the slate clean and starts from scratch, it’s time to start moving over to other APIs where the grass is greener.

DirectX is a construction company which builds decent buildings but tear them down every other year to build a new better building. Mantle is a shiny new sci-fi prototype building. OpenGL started as a tree house 20 years ago and has had new functionality nailed to it for way too long, so technically it’s just as advanced as DirectX and Mantle; it’s just that it’s all attached to a frigging tree house so it keeps falling apart and is basically unfixable.