vertex shader causing slowdown

So I want to draw a height field using shaders. I have a VBO of 2D position coordinates and 2D texture coordinates, and a texture containing height values. I have a shader that takes in this texture and displaces vertices based on the value in the texture.

Before I used this approach I just had one big VBO with 3D position coordinates, but I wanted to be able to deform the height field quickly and easily using shaders by making changes to the texture, so I changed to this approach.

So I get this working, but it is really slow (~6fps). The height field is 600x600 and ran at around 50-60 fps when I was using just a VBO.

I am curious if this is just too much data for the GPU to handle, or if I am doing something wrong. So here’s my vertex shader,


uniform sampler2D heights;

void main()
{
    gl_TexCoord[0] = gl_MultiTexCoord0;
    
    vec4 position = gl_Vertex;
    // Since the texture is black and white RGB, just take the value in red, and scale by 2
    position.z += texture2D(heights, gl_MultiTexCoord0.st).r * 2.0;
    
    gl_Position = gl_ProjectionMatrix * gl_ModelViewMatrix * position;
}

And the fragment shader:


uniform sampler2D heights;

void main()
{
    gl_FragColor = texture2D(heights, gl_TexCoord[0].st);
}

And here’s a snip of the rendering process just in case:


* textures and buffers are bound, vertex and texcoord pointers created *
* The field is a flat plane rendered as rows of triangle strips, the shader displaces along z *
       int stride = numCols * 4; 
        shader.engage(gl);
        // Draw the VBO
        for(int i=0; i<(numRows-1)/2; i++) {
            gl.glDrawRangeElements(GL.GL_TRIANGLE_STRIP,
                0,
                endIndex,
                stride,
                GL.GL_UNSIGNED_INT,
                stride*i*BufferUtil.SIZEOF_INT
            );
        }
        shader.disengage(gl);

Thanks!

Older GPUs can be slow when doing texture sampling in vertex shader

I was reading online that ATI didn’t implement vertex texture fetching (apparently thats the name for it) for a while.

I am running this on a MacBook pro with an ATI X1600, do you think this card is too old for this stuff?

I read this article: http://www.ozone3d.net/tutorials/vertex_displacement_mapping.php
and they are running fine with 80,000 polygons, not to mention my shaders are simpler… must be my GPU I guess :frowning:

However, according to this: http://developer.apple.com/graphicsimaging/opengl/capabilities/
I should have access to up to 16 texture units in vertex programs (search the page for MAX_VERTEX_TEXTURE_IMAGE_UNITS_ARB)

I found even recent graphics cards are very slow at texture fetches in vertex shaders - I think I was using an nVidia 8800 and it was an unbearably slow sub 10fps. Plus you’re using a non-power-of-two texture, so you’re probably hitting a really slow path.

If you just want deformable height maps then you should just be able to use regular VBOs. Make sure you use indexed geometry and slice up your buffers so you’ve got a single buffer with all of the z coords / heightmap values in and you should be golden.

How do I specify two separate VBOs for my position data? Could I have one VBO with 2D position and TexCoords and a separate one for height displacement? In other words, is this a possible setup for my height field?

VBO1: posX1, posY1, texX1, texY1, … , posXn, posYn, texXn, texYn
VBO2: posZ1, posZ2, posZ3, … , posZn
VBO3: triangle indices

thanks

Could be… just keep in mind that using 1 interleaved VBO for everything is so much faster, that you might want to deal with the inconvenience of updating your Z values inside that VBO.

Vec3, Tex2


void put(int i) { fbuf.put(i*5+2); }
float get(int i) { return fbuf.get(i*5+2); }

Vec3, Tex2, Norm3 (faster due to stride of 8 floats)


void put(int i) { fbuf.put(i<<3+2); }
float get(int i) { return fbuf.get(i<<3+2); }

If that’s a bit too much hussle, you can put all your heightdata in a float[], and pump all your floats to the FloatBuffer->VBO just before rendering. This will probably make ‘clientside’ (say: non GPU-side) performance much better, as float[] is much more stable (performance-wise) than direct (native) FloatBuffers.

Thanks for all the advice. I did some research and it turns out that ATI has taken a loong time to adopt vertex shader texture fetching, so their cards are especially slow at it. I am an undergrad student in a computer graphics lab at school, so I just asked to be moved to a computer with a better graphics card, (nVidia 8800 GTS, dual CUDA! Sooo awesome…) and now I am pumping out 160+ fps. Oh technology :stuck_out_tongue:

So I think I will stick with the shader approach, because I really like the idea of using a shader to do my deformation of the field using a texture. VBOs are definitely something to keep in mind though if/when I want to make my code more portable to computers without behemoth GPUs.

Cheers