OpenGL Questions

Troubleshoots · November 28, 2013, 5:37pm

Scroll below to see the newest Q&As. Some of the earlier questions regard later OpenGL versions and vise-versa, so if you’re looking at this topic to see an answer, make sure you look around. It’s a little muddled up.

Old question
I wasn’t really sure where to start with OpenGL so I had a look at SHC’s tutorials and started there. Hopefully I’m wording this correctly (a lot of my confusion is with the wording of things).
Let me start off with what I think I know:

Everything is done in a line in the order you call functions
The modelview matrix is used to manipulate objects in the world
The projection matrix adjusts the view of the camera
You use the glMatrixMode() function to select the matrix you want to manipulate
The glLoadIdentity() function resets the matrix to its original state
You need to use glOrtho() to set up an orthographic projection to view the modelview matrix
You manipulate a matrix between the glPushMatrix() and glPopMatrix() functions
The glTranslatef() function moves the modelview matrix position
You draw vertices between the glBegin() and glEnd() function. The parameter of glBegin() is the shape you want to draw.
The glViewport() function resizes the orthographic camera

I’ve probably got half that wrong and I’d really appreciate if someone could tell me exactly what I’ve got wrong and in the simplest possible way you can think of, the correct definitions.

Also I have a question:

Why are modelview and projection both matrices. I see a lot of references to the modelview matrix but I don’t understand what makes it a matrix. I also have seen references to the matrix stacks. I understand what a stack is, but how are these processed?

Thanks in advance to anyone who helps me. This is confusing. ???

theagentd · November 28, 2013, 6:14pm

The modelview matrix is actually two matrices in one. The model matrix (AKA object matrix) is used to position a model. This is very useful when you have a 3D model with vertices in local space since it allows you to position, scale and rotate the model to its appropriate position in world space. The view matrix does something completely different: It holds the inverse of the position and orientation of the camera. The projection matrix can be seen as the lens of the camera; It defines stuff like field-of-view angle, aspect ratio and near/far planes when using 3D perspective projection or the bounds of the orthographic projection when using orthographic projection. Finally, the viewport defines what part of the window the projected coordinates should map to.

Crash course on matrices: Multiplying a matrix with a vertex takes it from one space to another. If we want to reverse this process, we can take the inverse of a matrix. This new matrix does the same transformation backwards so we get back the original vertex.

Local space --model matrix–> World space. This is pretty easy to understand. If we call glTranslatef() on this matrix we’ll move the object around in world space.
World space --view matrix–> Eye space. This is a bit more complicated. Let’s say we have a camera position (x, y). If we do the same thing we did to the model matrix by calling glTranslatef(x, y, 0), we’re actually getting a matrix that takes things in eye space and transforms them into world space. We want to do the opposite, AKA the inverse of it. The simplest fix is therefore to simply do it backwards manually by calling glTranslatef(-x, -y, 0). That’s a proper view matrix.
Eye space --projection matrix–> Normalized device coordinates. The projection matrix takes in coordinates relative to the eye (camera) and maps all three dimensions to [-1, 1], so if the coordinates are (0, 0, 0) after transformation, they’re at the center.
Normalized device coordinates --viewport–> screen coordinates. Finally these [-1, 1] coordinates are mapped to actual pixels using the viewport settings. If you call glViewport(100, 100, 200, 200) and end up with a coordinate with the normalized device coordinates (0, 0, 0), they’ll be mapped to to (150, 150) in the actual window.

Why 2 matrices when we actually need 3? This has to do with 3D. World space coordinates aren’t really needed for anything special, but lighting is usually done in eye space. By premultiplying the model and view matrices together, we get a single matrix that does the same as both the original matrices, so it’s basically a shortcut to eye space. We can’t skip eye space though since we need to do lighting there, so in 3D we can’t multiply in the projection matrix too. For 2D however, this point is moot. There’s no actual need for 2 separate matrices, but since you’re stuck with two matrices you might as well use them as they’re supposed to be used, if only because it’s a good habit once you start with basic 3D rendering.

glPushMatrix() stores the current matrix on the matrix stack. Think of the stack as a pile of matrices. Push stores the matrix on top of the pile and pop takes off the matrix on top again. This is also called a First-in-First-out queue. This is very useful when working with the modelview matrix:


glLoadIdentiy(); //Reset matrix

glTranslatef(-cameraX, -cameraY, 0); //Set up the view matrix part


glPushMatrix(); //Store the current matrix
glTranslate(objectX, objectY, 0); //Position object (model matrix part)
glBegin(...);
//Render object...
glEnd();
glPopMatrix(); //Restores the pushed matrix. Basically undoes the glTranslate(objectX, objectY, 0) call.

glPushMatrix(); //Store the current matrix
//render another object
glPopMatrix(); //Restores the pushed matrix.

Troubleshoots · November 28, 2013, 8:13pm

Thank you. Your explanation is very helpful and I understand the parts about each Matrix, but;

I don’t understand this.

theagentd:

Local space --model matrix–> World space. This is pretty easy to understand. If we call glTranslatef() on this matrix we’ll move the object around in world space.

World space --view matrix–> Eye space. This is a bit more complicated. Let’s say we have a camera position (x, y). If we do the same thing we did to the model matrix by calling glTranslatef(x, y, 0), we’re actually getting a matrix that takes things in eye space and transforms them into world space. We want to do the opposite, AKA the inverse of it. The simplest fix is therefore to simply do it backwards manually by calling glTranslatef(-x, -y, 0). That’s a proper view matrix.

Eye space --projection matrix–> Normalized device coordinates. The projection matrix takes in coordinates relative to the eye (camera) and maps all three dimensions to [-1, 1], so if the coordinates are (0, 0, 0) after transformation, they’re at the center.

Normalized device coordinates --viewport–> screen coordinates. Finally these [-1, 1] coordinates are mapped to actual pixels using the viewport settings. If you call glViewport(100, 100, 200, 200) and end up with a coordinate with the normalized device coordinates (0, 0, 0), they’ll be mapped to to (150, 150) in the actual window.

I think the root of the problem is that I don’t understand what local space, eye space or NDCs are.

Also, why in your code example do you push, pop and then re-push and re-pop the matrix? What does undoing the glTranslate(…) call do?

RobinB · November 28, 2013, 9:15pm

This is why its a matrix:

theagentd · November 28, 2013, 9:39pm

Local space is the simplest, but also not very relevant for 2D. If you open up a 3D model file, you’ll find vertex positions. These are relative to some origin (0, 0, 0) point chosen by the modeler. For a cube it’d most likely be the center of the cube, and for a human it’s usually the point the human is standing on right between his feet. The same concept applies for a 2D sprite. Let’s say you set up a sprite centered over (0, 0).


(-1, -2)        (1, -2)
       +--------+
       |        |
       |        |
       | (0, 0) |
       |        |
       |        |
       +--------+
(-1, 2)         (1, 2)

This sprite is made out of 4 vertices forming a quad, and its vertices are in currently in local space. However, it should be obvious that we don’t always want to draw this sprite centered at (0, 0). Let’s say that this is a player sprite, so we want to move it to where the player currently is, which we’ll say currently is at (50, 50). We’ll therefore need to translate the model matrix to (50, 50) to move the object’s coordinates so they’re centered at (50, 50) instead of the sprite’s original origin of (0, 0). For the sake of it, we also want to make the sprite twice as big, so we also scale it using glScalef(2, 2, 1). When we multiply it by this matrix, we get the following sprite:


(48, 46)      (52, 46)
       +--------+
       |        |
       |        |
       |(50, 50)|
       |        |
       |        |
       +--------+
(48, 54)       (52, 54)

The sprite is now at its world position. Again, this can be in any unit you want as long as it makes sense to you. It could be pixels on the screen, millimeters in an ant strategy game, blocks in tetris or light-years in a space game. This is a space defined by you and it’s the same space as all your game object’s coordinates are in.

Next we can also move around the camera in the world. Since the above sprite is the player sprite, the camera happens to be tracking a point 5 units to the right of the player sprite. As I wrote in my previous post, the view matrix is responsible for camera movement. Basically it’s supposed to move vertices so they’re relative to the camera instead of relative to world’s arbitrarily chosen (by you) origin. So we translate our view matrix using glTranslatef(-50, -50, 0) and transform our sprite using it:


(3, -4)          (7, -4)
       +--------+
       |        |
       |        |
       | (5, 0) |
       |        |
       |        |
       +--------+
(3, 4)           (7, 4)

As you can imagine eye space is very similar to world space, only that the objects are relative to the camera instead. In other words, in this space the camera is always at (0, 0). For 2D, this doesn’t actually mean much and is usually not actually the case though. The reason lies in how most people use glOrtho(). glOrtho(0, 100, 100, 0, -1, 1) will map (0, 0) in eye space to (-1, 1) in normalized device coordinates, which is the top left corner. If you pass in (100, 100), you’ll end up at (1, -1), which is the bottom right corner. These coordinates are then mapped to the screen using the viewport. What this glOrtho() call in practice does is map the area (0, 0) to (100, 100) of your eye space coordinates to the viewport. With that glOrtho() call and our viewport set to (0, 0, 100, 100), our sprite would end up at the top left corner of the screen, with the top half of it being outside of the screen.


   (3, -4)          (7, -4)
          +--------+
          |        |
(0, 0)    |        |   Screen edge
    +-------|-(5, 0)-|----------
    |     |        |
    |     |        |
    |     +--------+
    |  (3, 4)      (7, 4)

EDIT: It should be obvious that this system is supposed to work well with 3D rendering, and is vastly overcomplicated for 2D…

quew8 · November 28, 2013, 11:16pm

@theagentd Whilst I was super impressed with your artwork (and it is art) I feel that an actual image might be more clear. Especially since there are a couple of excellent ones in the Red Book.

I know they’re in a 3D context but I think they’re just as relevant for 2D.

theagentd · November 29, 2013, 12:55am

Evidently I skipped a step: The perspective divide. It’s done automatically so I didn’t want to confuse you…

Troubleshoots · November 29, 2013, 2:01pm

@theagentd Thank you so much. Your explanations are so brilliant.
You saved me from getting confused with all the jargon that you find on Google.

One more thing though. In your example you push then pop the matrix, then push and pop it again. Do you have to do this for every object that you want to draw in world space, or can you just do:


glPushMatrix();
glTranslatef(x, y, z);
glBegin(...);
//Render object...
glEnd();
glTranslatef(anotherX, anotherY, anotherZ);
glBegin(...);
//Render object...
glEnd();
glPopMatrix();

If so, why would you pop the matrix then push it again?

theagentd · November 29, 2013, 5:07pm

Troubleshoots:

@theagentd Thank you so much. Your explanations are so brilliant.
You saved me from getting confused with all the jargon that you find on Google.

One more thing though. In your example you push then pop the matrix, then push and pop it again. Do you have to do this for every object that you want to draw in world space, or can you just do:
glPushMatrix();
glTranslatef(x, y, z);
glBegin(...);
//Render object...
glEnd();
glTranslatef(anotherX, anotherY, anotherZ);
glBegin(...);
//Render object...
glEnd();
glPopMatrix();
If so, why would you pop the matrix then push it again?

Ah, sorry, forgot to answer that question in my writing frenzy. xD

The reason we have do that is because the model and view matrices are combined into one matrix. If we had two separate matrices for this, it’d be much cleaner to do something like this each frame:


glMatrixMode(GL_PROJECTION);
glLoadIdentity(); //reset
glOrtho(...);

glMatrixMode(GL_VIEW); //THIS IS NOT A VALID LINE!!!
glLoadIdentity(); //reset
glTranslatef(-cameraX, -cameraY, 0);

glMatrixMode(GL_MODEL); //THIS IS NOT A VALID LINE!!!

for(int i = 0; i < objects.size(); i++){
    GameObject obj = objects.get(i);
    
    glLoadIdentity(); //reset model matrix
    glTranslatef(obj.getX(), obj.getY(), 0);
    glRotatef(obj.getAngle(), 0, 1, 0);
    glScalef(obj.getScale(), obj.getScale(), 1);

    glBegin(...);
    ...
    glEnd();
}

Note: This code was written in this window and may have typos or minor errors. Focus on the big picture.

Basically we set up the projection and view matrices at the beginning of the frame, but the object matrix needs to be reset each for each object because all matrix modifying commands stack up.
[icode]glTranslatef(5, 0, 0) + glTranslatef(5, 0, 0) = glTranslatef(10, 0, 0)[/icode]
If we were to not reset the matrix in the object rendering loop, only our first object would render correctly while the rest would most likely end up far outside the screen somewhere.

However, the above code is as I said not valid since we actually only have two matrices. Since our view matrix doesn’t change in the middle of a frame being rendered, we still want to set it up just once. We could solve this quite easily by setting up our view matrix in our modelview matrix, saving it, then modifying its “model matrix part” for our object, render that object and then reloading the saved matrix into OpenGL so we’re back to our original view matrix again:



private FloatBuffer viewMatrix = BufferUtils.createFloatBuffer(16);

...

glMatrixMode(GL_PROJECTION);
glLoadIdentity(); //reset
glOrtho(...);

glMatrixMode(GL_MODELVIEW);
glLoadIdentity(); //reset
glTranslatef(-cameraX, -cameraY, 0);

//glMatrixMode(GL_MODEL);
//There is no separate model matrix.
//Instead we'll retrieve the unmodified view matrix and save it in our FloatBuffer.
glGetFloatv(GL_MODELVIEW_MATRIX, viewMatrix);

for(int i = 0; i < objects.size(); i++){
    GameObject obj = objects.get(i);
    
    //glLoadIdentity(); //We can't reset it completely! That'd undo our camera translation!
    glTranslatef(obj.getX(), obj.getY(), 0);
    glRotatef(obj.getAngle(), 0, 1, 0);
    glScalef(obj.getScale(), obj.getScale(), 1);

    glBegin(...);
    ...
    glEnd();
    
    //By now, the three object specific transformations above aren't needed anymore,
    //and we need to get rid of them so we can render the next object. glLoadIdentity()
    //would also reset our view matrix, so let's just overwrite the matrix with our stored
    //unmodified view matrix instead!
    glLoadMatrixf(viewMatrix);
    //And voila! We've effectively reset our model matrix but left our view matrix untouched!
}

This works perfectly fine, and you could even say that this looks cleaner than using glPush/PopMatrix(). The exact same thing can be accomplished with glPush/PopMatrix() though and you wouldn’t need a FloatBuffer variable to hold the view matrix since it does that for you.


glMatrixMode(GL_PROJECTION);
glLoadIdentity(); //reset
glOrtho(...);

glMatrixMode(GL_MODELVIEW);
glLoadIdentity(); //reset
glTranslatef(-cameraX, -cameraY, 0);

for(int i = 0; i < objects.size(); i++){
    GameObject obj = objects.get(i);
    
    //glLoadIdentity(); //We can't reset it completely! That'd undo our camera translation!
    glPushMatrix(); //Stores the current matrix on the stack, in our case the unmodified view matrix.
    glTranslatef(obj.getX(), obj.getY(), 0);
    glRotatef(obj.getAngle(), 0, 1, 0);
    glScalef(obj.getScale(), obj.getScale(), 1);

    glBegin(...);
    ...
    glEnd();
    
    //Resetting time! Since we pushed the unmodified view matrix to the matrix stack, we can pop it
    //off the stack again to get back the matrix we pushed onto it. This also removes it from the stack
    //which is why we need to push before rendering each object. You can think of push as a kind of
    //matrixStack.add(getCurrentMatrix()) and pop as setCurrentMatrix(matrixStack.removeLast()).
    glPopMatrix();
    //And voila! We've restored the unmodified view matrix!
}

I hope that explains it.

Pushing and popping matrices is especially useful when you have hierarchical objects. Let’s say you have a city object with a number of buildings in it. Each building has a position relative to the city it is in.


for(City city : cities){
    glPushMatrix(); //Save view matrix
    glTranslate(city.getX(), city.getY(), 0);
    for(Building b : city.getBuildings()){
        glPushMatrix(); //Save modelview matrix of the city
        glTranslate(b.getX(), b.getY(), 0); //Stacks up with the city's glTranslate()
        glBegin(...);
        ... //Render building
        glEnd();
        glPushMatrix();
    }
    glPopMatrix();
}

A very important thing to note though is that this can be extremely slow if you’re approaching 1000 objects. It’s worth noting that that for this reason all built-in matrices have been deprecated starting with OpenGL 3 and above, but I still think that this is a good place to start if you’re new to OpenGL. Learn how it works and then quickly try to move on to shaders. The missing matrix functionality can be replaced by a math library, like the one included in LWJGL. It’s not the best one out there, but for 2D it should be more than enough.

Troubleshoots · November 30, 2013, 7:22pm

So you’re saying that since the model and view matrices are combined, you have to save the modelview matrix after setting up the view matrix, then because translations stack up you have to keep saving and loading the matrix with every translation otherwise the first translation would effectively be done twice on the second translation and so on?

Also I’ve read SHC’s tutorial on textures:

Am I right in saying that glGenTextures() returns a unique id number for the texture?
Does binding a texture use that id number to select which texture to bind and does binding a texture ensure that the current bounded texture is the only texture affected by OpenGL calls?
Why do you bind a texture every time you render? Is this a way of indicating which texture to render?

So essentially, what is ‘binding’ a texture?

theagentd · November 30, 2013, 8:22pm

Concerning glGenTextures() you’re right. It basically gives you a currently free texture handle and marks it as “in use” for future calls to glGenTextures().

Binding in OpenGL means that the all subsequent commands will affect or use the bound object. Binding a texture both allows you to modify it with subsequent calls like glTexImage() and glTexParameter*(), and to apply it to your rendered geometry. This is a recurring concept in OpenGL and is also used for Vertex Buffer Objects (VBOs), Vertex Array Objects (VAOs), Framebuffer Objects (FBOs), etc.

It’s important to understand how the target parameter works. The OpenGL specification has this to say:

[quote]When a texture is first bound, it assumes the specified target: A texture first bound to GL_TEXTURE_1D becomes one-dimensional texture, a texture first bound to GL_TEXTURE_2D becomes two-dimensional texture, a texture first bound to GL_TEXTURE_3D becomes three-dimensional texture […]
[/quote]
What this essentially means is that the first call to glBindTexture() also associates a texture handle you’ve gotten from glGenTextures() with the specified target. The spec continues:

[quote]While a texture is bound, GL operations on the target to which it is bound affect the bound texture, and queries of the target to which it is bound return state from the bound texture.
[/quote]
Note how the targets must match between your texture related commands! In essence, you can have both a 1D texture and a 2D texture bound at the same time since they’re bound to different targets, and direct OpenGL commands to either of them using GL_TEXTURE_1D and GL_TEXTURE_2D as targets to your commands.

Troubleshoots · December 5, 2013, 10:32pm

So I’m learning how to use shaders. I have a few questions:

What’s the difference between creating a program and creating a shader?
Why do you have to load the shader program from a string? Why aren’t they loaded from buffers like vbo’s?
What does attaching a shader to a program do?
Once you’ve attached a shader, why do you have to link it? What is linking a shader?

Thanks.

davedes · December 5, 2013, 10:46pm

A shader ‘object’ defines a vertex shader, a fragment shader, or even just a single function. A ‘program’ links together all of its attached ‘objects’ so you can use it. The idea was to decouple everything so that you can re-use functions / shaders across multiple programs. All good in theory but not all drivers implement that correctly and usually it isn’t worth the trouble trying to share shader objects. In ES it’s not supported afaik.

LWJGL uses strings for convenience. Pretty sure you can use the direct buffer method too.

Troubleshoots · December 5, 2013, 11:00pm

Thanks but I’m still unclear on what the glLinkProgram function does. The glAttachShader function attaches the shaders to the program so what does linking the program do? Also this may be a stupid question but when you say shader object do you mean shader? Are they the same thing?

theagentd · December 6, 2013, 1:59am

Although your vertex and fragment shaders may both have successfully compiled, they still have to be linked together to form a sort of pipeline:

(vertex data) --> vertex shader --> (rasterizer generates pixels) --> fragment shader --> (pixels written to framebuffer)

Linking does further optimizations based on how the vertex shader and pixel shader interact with each other. Let’s say you have color data in your VBO and your vertex shader reads this and passes it on to the pixel shader. However, the pixel shader ignores the color value and makes everything white regardless of the color value. In this case, the linking compiler will realize that generating a color value for each pixel is just wasted work since it won’t be used at all, so it removes that output from the vertex shader. This in turn makes the color vertex attribute (vertex shader input) unnecessary since it’s not being used either, and poof; there goes that as well, and you’ll get -1 when you try to query the location of that attribute from Java (= attribute doesn’t exist). GLSL always optimizes away unused uniforms and attributes.

What’s the point of this? For example, you can write a massive vertex shader that does everything you’ll ever need: Colors, texture coordinates, shadow map coordinates, normals, tangents, you name it. Then you can reuse this vertex shader for any number of fragment shaders that only use a small number of those output variables without having to worry about performance, since the compiler will automatically optimize away unused variables and computations that aren’t needed by that specific fragment shader. The linking step allows you to mix and match vertex and fragment shaders and get optimal performance anyway. It also has uses in more advanced OpenGL.

Riven · December 6, 2013, 6:19am

[quote=“theagentd,post:15,topic:45317”]
Don’t keep us waitin’! The anticipation is killing me.

SHC · December 6, 2013, 6:48am

Sorry for not replying before, I’m at the college hostel and I came home due to a lot of strikes today.

Before writing about those functions, I want to say how C programs get created (Just to get the term LINKING). There, linking means that the generated object code is (in some compilers) linked to a stub (format the OS can understand) which is the executable file we see after compilation. This is the same concept here, [icode]glLinkProgram[/icode] links the program into an executable that the GPU can run. Before linking the program, we attach the shaders to the program using the [icode]glAttachShader[/icode] program. This function attaches the shaders in an order that first the vertex shader is attached and then the fragment shader is attached, no matter what the order we used in our source code.

Then at the time of executing, the GPU passes the program passes the vertex data to the vertex shader.


(vertex_data) --> (vertex_shader)  // Transforms the vertices and generates the pixels

The generated pixels are passed to the fragment shader.


(pixels) --> (fragment_shader)     // Adds the colour data from the textures to the pixels and lighting

Those pixels will be then transformed to the screen coordinates and displayed on the screen. This is just a basic view of the SHADERS and you can get more info on that topic here.

theagentd · December 6, 2013, 4:50pm

[quote=“Riven,post:16,topic:45317”]

I thought it’d be unrelated so I decided not to write anything, but here goes:

You have to set up certain things before linking. In OpenGL 3+, there’s no built-in gl_FragColor output for fragment shaders, so you have to define your output(s) yourself. When combined with MRT (rendering to multiple textures at the same time), you have to specify which output goes to which color attachment using glBindFragDataLocation(). This has to be done before linking. (Note: The output index can also be defined in your shader.)

The same is true when capturing vertex data using transform feedback. You have to tell OpenGL which outputs of your vertex or geometry shader you’re interested in using glTransformFeedbackVaryings() to prevent the GLSL compiler from potentially optimizing those attributes away. Again, this has to be done before linking.

SHC:

…

The generated pixels are passed to the fragment shader.
(pixels) --> (fragment_shader)     // Adds the colour data from the textures to the pixels and lighting
Those pixels will be then transformed to the screen coordinates and displayed on the screen. …

Just a small detail, but I’d like to point out that pixels are generated by transforming the geometry to screen coordinates and filling in the pixels that have their centers covered by the geometry. Screen coordinate transformation happens before the fragment shader and is available to the fragment shader in the built-in varying gl_FragCoords.

StumpyStrust · December 6, 2013, 5:09pm

This is my reaction to these explanations.

Troubleshoots · December 6, 2013, 5:13pm

Well said.

@theagentd @SHC Thanks a lot, nice explanations.