Vec4 Mathematics

I have seen many peoples opinions on this on the internet and I can’t find a definitive answer…Is it better to do your Vector math (myMatrix * myVector) In Java and pass it to GLSL or should I pass the Vectors to GLSL and let it do the math.

I have heard people on both sides say that it’s faster one way or another but I can’t seem to see a difference…is there one?

Mostly there just won’t be a difference, cpus are that fast these days. Moving the data around costs more than working on that data in many cases. You should use what you think makes code nicer.

Having said that. Phones may be different, and different phones may be different.

If you only want to know if there’s a difference, then yes, there is one: If you do the calculation in Java, it happens once before each shader execution (if you have some sort of simple, standard draw mechanism). If you do it in the shader, you do it once per vertex (if the calculation is done in the vertex shader) or per primitive or fragment. Depending on the geometry and/or the screen coverage of the projected object, you have dozens of thousends invocations (your object has 200k vertices? 200k matrix multiplications in the vertex shader…). Additionally, you have to pass more data to the shader, so more bandwidth is used, although this can be neglected in most scenarios.

However, you asked for the scenario with only one matrix and a vector. My arguments mostly apply for the cases, where you have two or more matrices to multiply before transforming a vertex in a shader. For example a model view projection. I tend to say, that your example scenario should never hapen at all, since you store your object’s vertices in a buffer, that should be static (except if you do software animation). Updating vertex data in a buffer is not the way to go with OpenGL at all, because you probably use vertex buffers, do you?

Finally, I have to say that the performance difference between premultiplying two or three matrices is … nearly not noticeable in almost all my scenarios. If you can, premultiply multiple matrices on the Java side and transform vertices in the shader. It may make a difference on mobile devices - but I fear you have to test for devices specifically.

The difference is usually negligible, but most people prefer to do the calculation in GLSL to reduce CPU overhead. I personally prefer doing it in GLSL.

EDIT: Some people also insist that it’s less expensive (in terms of performance) to do it in your Java/C++/C/whatever code, instead of GLSL. It’s really up to you.

Because there isn’t… it depends on your own situation

Does GLSL do any optimsations? If I pass a 3 x uniform matrix and multiple them in a vertex shader (model / view / projection), does GLSL recognise that the uniforms don’t change and only calculate it once?

I have been thinking about this recently too and my instinct tell me you should go with CPU calculations ( because of the per vertex work saved). But if GLSL can optimise things then I think GLSL is a no brainer.

[quote]does GLSL do any optimsations?
[/quote]
Yes it does, but it depends and implementation specific. ie it is up to the vendor. Assuming your using indexed primitives it can cache vertex calculations. ie properly optimized triangle soups you will very rarely calculate a vertex twice. But typically each and every vertex is calculated once even if some calculations can be taken “out of the vertex loop”. But again it is Meh since cards these days just never seem to hit vertex limits anymore, matrix calculations in fragments. Well that is a different story.

No, there’s no evidence that any driver does this specific optimization. For example:


gl_Position = projection * view * model * vec4(position, 1.0);

compiles to: http://www.java-gaming.org/?action=pastebin&id=1448, and can be optimized massively as this is 2 complete 4x4 matrix multiplications followed by a vector multiplication, which can be optimized to this (monstrous):


gl_Position = projection * vec4((view * vec4((model * vec4(position, 1.0)).xyz, 1.0)).xyz, 1.0);

which only compiles to: http://www.java-gaming.org/?action=pastebin&id=1449

Simple reordering of operations and assuming that the view and model matrices are affine matrices can make your shader go from 52 ALU cycles to 13 cycles. In practice this is not a that big difference as vertex shaders are often bottlenecked by the number of output parameters, but for simple vertex shaders like shadow mapping vertex shaders that only output gl_Position you should always optimize the shader as much as possible.

Yes it does, but it depends and implementation specific. ie it is up to the vendor. Assuming your using indexed primitives it can cache vertex calculations. ie properly optimized triangle soups you will very rarely calculate a vertex twice. But typically each and every vertex is calculated once even if some calculations can be taken “out of the vertex loop”. But again it is Meh since cards these days just never seem to hit vertex limits anymore, matrix calculations in fragments. Well that is a different story.
[/quote]
What you’re talking about is indexed rendering. If you draw a quad using 4 vertices and 6 indices forming two triangles, the vertex shader will only run 4 times (once for each vertex, not index). This is because the vertex shader’s output is cached in a finite-size cache (meaning that it MAY need to be rerun if it’s used after being evicted from the cache, but not in a trivial case like this), which allows OpenGL to reuse the vertex when building the two triangles. It has nothing to do with the shader’s compilation or performance.

EDIT:
To actually answer the question if it’s worth doing on the CPU or the GPU: It depends.

In a 2D game, you’re usually CPU limited by small draw calls, sorting of sprites, game logic, etc. In addition, you’re usually waaaaay underutilizing the GPU with super-simple shaders, tiny texture formats (RGBA8), so anything you can offload to the GPU is usually a win.

For 3D games, you’re usually heavily GPU limited by the sheer number of pixels and triangles you have to work with, heavy post-processing shaders, shadow maps, etc etc etc. In this case, optimize everything you can on the CPU. Premultiplying the view and projection matrices can save 20 instructions per vertex X 1 million vertices of GPU performance. It’s all a lot more complicated as the GPU has lots of individual hardware that can all bottleneck you in different ways, so it’s not a straightforward answer. In general, the answer is “premultiply everything you possibly can on the CPU” unless proven otherwise, which it never usually is.

Vertex Cache, which i mentioned is an optimization. Potato Patato…

That still doesn’t have anything to do with GLSL compiling.

Way to miss the forest from the trees.

Your answer was technically off-topic. My answer covers the thread AND points out that your answer is technically off-topic. I can’t think of a funny saying to point that out though, so I guess you win.

EDIT: If you want to continue this specific discussion we should probably do that by PMs so as not to bother everyone here. :point: