[solved] Horrible performance with GL_UNSIGNED_BYTE

I’m learning OpenGL and decided to make some voxel rendering.

I use VBO to hold mesh for one chunk and I store only visible faces into the VBO
One VAO and one VBO per chunk.

Each chunk is only 32x32 so I decided to use unsigned byte as vertex data instead of floats.
Each vertex is stored with its local position in chunk.
I store chunk offset as uniform and add it to vertex data to render it at correct position.

Performance decreased… mostly visible on ATI cards (its horrible there)

I noticed it is very dependant on number of vertices. As all data is uploaded to VBO, it is clear that it’s fail on GPU side.
I removed the lightning calculations to see if they were the cause, but it seems they weren’t.

Maybe it’s something with shaders having problem converting data types?
It works good with floats, but not with bytes.

Here are my shaders:

#version 130

uniform mat4 camera;
uniform vec3 chunk; // chunk offset

in vec3 vert; // vertex
in vec2 vtex; // texture coordinate

out vec2 ftex;

void main() {
	// as I store UV in bytes, i need to divide them by amount of tiles I have in texture (currently 2x2)
    ftex = vtex/2;
	// I add vertex position and chunk offset to get corrent world position of vertex
    gl_Position = camera * vec4(vert+chunk, 1);

#version 130

uniform sampler2D tex;

in vec2 ftex;

out vec4 final;

void main() {
    final = texture2D(tex, ftex);

They do almost nothing, but they perform hirribly.
Might it be the problem with input data definition?

in vec3 vert; // vertex
in vec2 vtex; // texture coordinate

vec3 and vec2 are float vectors - but VBO contains bytes… is it really so hard for gpu to convert those types? I don’t believe.
Maybe I should have use ivec3 and ivec2 (or uvec3 and uvec2) but if I use them I get veery werid input. deffinitely not what vbo contains.

What my scene looks like you can see in this demo: (light and all the stuff is removed to concentrate on the problem - simple vertex data rendering is horribly slow)
http://dev.keraj.net/czyde/ - it should work with about 900fps … but it works with 100 on NVIDIA and 15(!) on ATI.

Show us how you configure your VAOs.
Also compare your numbers when using half-floats,

Is you data 4-byte aligned?


There is something you should watch out for. The alignment of any attribute's data should be no less than 4bytes.
So if you have a vec3 of GLushort​s, you can't use that 4th component for a new attribute (such as a vec2
of GLbyte​s). If you want to pack something into that instead of having useless padding, you
need to make it a vec4 of GLushort​s.

@pitbuller: That’s probably it!

I used 3 unsigned bytes for coords + 2 unsigned bytes for texture coords.

I just changed vec3 to vec4 and added one dummy value + changed UV to unsigned shorts. So they are all now 4-byte aligned and it went to 300 fps on ATI card.


How could I have missed that best-practices page. :smiley:

So is it still 3times slower than with just using floats?

No, the case I tested it with now is the same speed with floats.
Sorry for additional confusion.