Updating VBO bogging down system

I am creating a system with many individual elements. Like a ballpit, each element a full object that must be updated continuously as the system receives outside input.

I’m updating my VBO like so:


	public void updateData()
	{
		FloatBuffer vertices = BufferUtils.createFloatBuffer((4*this.getObjectCount()) * 7); //4 * 7
        IntBuffer elements = BufferUtils.createIntBuffer((2*this.getObjectCount()) * 3); //2 * 3
    	
    	for(int i = 0; i < this.getObjectCount(); i++)
        {
            vertices.put(this.getObject(i).getX()).put(this.getObject(i).getY()).put(rVal).put(gVal).put(bVal).put(this.getObject(i).getTCX1()).put(this.getObject(i).getTCY2());
            vertices.put(this.getObject(i).getX() + this.getObject(i).getCurrFrameWidth()).put(this.getObject(i).getY()).put(rVal).put(gVal).put(bVal).put(this.getObject(i).getTCX2()).put(this.getObject(i).getTCY2());
            vertices.put(this.getObject(i).getX() + this.getObject(i).getCurrFrameWidth()).put(this.getObject(i).getY() + this.getObject(i).getCurrFrameHeight()).put(rVal).put(gVal).put(bVal).put(this.getObject(i).getTCX2()).put(this.getObject(i).getTCY1());
            vertices.put(this.getObject(i).getX()).put(this.getObject(i).getY() + this.getObject(i).getCurrFrameHeight()).put(rVal).put(gVal).put(bVal).put(this.getObject(i).getTCX1()).put(this.getObject(i).getTCY1());            
                    
            elements.put((i*4) + 0).put((i*4) + 1).put((i*4) + 2);
            elements.put((i*4) + 2).put((i*4) + 3).put((i*4) + 0);
        }  
    	
        vertices.flip();
        elements.flip();
                
        vbo2.bind(GL_ARRAY_BUFFER);
        vbo2.updateData(GL_ARRAY_BUFFER, 0, vertices);
        vbo2.unbind(GL_ARRAY_BUFFER);
        
        ebo2.bind(GL_ELEMENT_ARRAY_BUFFER);
        ebo2.updateData(GL_ELEMENT_ARRAY_BUFFER, 0, elements);
        ebo2.unbind(GL_ELEMENT_ARRAY_BUFFER);
	}

[icode]those vertices are: X, Y, r, g, b, tex1, tex2[/icode]

If I do not update the VBO, the system runs at around my normal best framerate. If updating the VBO is allowed, at <300 objects it runs decently, but after 300-400 you start to feel it and the FPS is already dropping by 33%. What can I do to relieve this? Is updating the VBO likely the problem, or do you think this could indicate a problem elsewhere?

I thought VBOs had a max size or an optimal size, but I’m seeing mixed information, or maybe it has changed with new OpenGL.

I’ve read about double buffering, but I’m not sure it is right, and I was having trouble grasping how it would help this situation.

Might be because you are creating new Float Buffer and Int Buffer every time you update it. Instead maybe try to create each buffer once with a maximum capacity and then fill in as much data is required.

I changed it so they are cleared and reused. This may have improved it by a bit, but not enough. Thanks tho!

Are you using LWJGL and building your own framework? Probably need to see more code to figure out what’s going on.
And what’s up with vbo2.updateData() ? Looks like a draw call.

I’m rendering ~150-200 things and I’m still hitting 60 fps but I use a float array and put the whole thing into the Float Buffer instead of a bunch of put calls.

Maybe I’ll generate a bunch of objects and do a little testing to see if it tanks above 300.

Yes, LWJGL3. This VBO class is a modification of one I picked up, but it just makes the calls you know in a “cleaner” appearing fashion.

vbo.updateData():

 public void updateData(int target, int colCount, FloatBuffer elements)
    {
    	glBufferSubData(target, colCount, elements);
    }

From what you say, perhaps my problem is in how I manage my vertex information. Perhaps the computational weight is in the .put statements?

I’d like to show you more code, but I’m not sure how I could isolate the relevant portions for you. It’s part of a big system, but I run almost at 60fps as long as this update function is commented out. I’ll look into managing my information differently and see if I can pass in a complete array rather than doing the .put statements.

I suppose tho, if that doesn’t work and I should be able to do more than 300 objects happily, perhaps the information for the system is not being supplied fast enough for the VBO update and subsequent draw call. The link between updating the VBO and the draw call is what ends up being related to bogging the system down, so I’d think it has to be this theory if my vertices information management isn’t the problem.

Also, this is almost more of a general programming question I should probably know already, but is this:
[icode].put(this.getObject(i).getY() + this.getObject(i).getCurrFrameHeight())[/icode]
so much worse than a simple
[icode].put(X)[/icode]?

There is both a calculation and object calls. I wouldn’t think so, but when you come up on bogging the system down in this way, every bit counts, right? Or is this negligible?

How much data are we talking about here?

In what way do you mean? I thought the code snippet told the tale.


FloatBuffer vertices = BufferUtils.createFloatBuffer((4*this.getObjectCount()) * 7); //4 * 7
IntBuffer elements = BufferUtils.createIntBuffer((2*this.getObjectCount()) * 3); //2 * 3

So if I’m doing say 300 objects, that’s a FloatBuffer of 43007 and an IntBuffer of 23003. 8400 and 1800 values respectively.

So I just zoomed out in my game and got it up over 500 objects to draw and notice zero difference, still getting 60 fps.

I’m running win 32 bit, 4GB Ram, and Intel Core 2 duo @ 2.66 Ghz w/ GeForce 9800M 512MB gpu.

I’m not sure if the put calls would slow it down because I’ve never done it that way. But you are doing one draw call with only one float buffer and only one int buffer right?

That’s correct. I mean, there’s other stuff going on, but as I say, those other draw statements and the entire rest of the project run happily at ~60fps. I doubt it’s a conflict there, though I could be wrong.

I appreciate your system spec info. That’s just about the same as me.

Yes, it is much worse. Is it negligible? You don’t know until you measure it.

Let’s put that loop into a more readable format:

for(int i = 0; i < this.getObjectCount(); i++)
{
    vertices
        .put(this.getObject(i).getX())
        .put(this.getObject(i).getY())
        .put(rVal)
        .put(gVal)
        .put(bVal)
        .put(this.getObject(i).getTCX1())
        .put(this.getObject(i).getTCY2());
    
    vertices
        .put(this.getObject(i).getX() + this.getObject(i).getCurrFrameWidth())
        .put(this.getObject(i).getY())
        .put(rVal)
        .put(gVal)
        .put(bVal)
        .put(this.getObject(i).getTCX2())
        .put(this.getObject(i).getTCY2());

    vertices
        .put(this.getObject(i).getX() + this.getObject(i).getCurrFrameWidth())
        .put(this.getObject(i).getY() + this.getObject(i).getCurrFrameHeight())
        .put(rVal)
        .put(gVal)
        .put(bVal)
        .put(this.getObject(i).getTCX2())
        .put(this.getObject(i).getTCY1());

    vertices
        .put(this.getObject(i).getX())
        .put(this.getObject(i).getY() + this.getObject(i).getCurrFrameHeight())
        .put(rVal)
        .put(gVal)
        .put(bVal)
        .put(this.getObject(i).getTCX1())
        .put(this.getObject(i).getTCY1());            
            
    elements
        .put((i*4) + 0)
        .put((i*4) + 1)
        .put((i*4) + 2);
    
    elements
        .put((i*4) + 2)
        .put((i*4) + 3)
        .put((i*4) + 0);
}

That’s an awful lot of redundant calculations and function calls. The Java runtime may be able to eliminate some of that for you, but I wouldn’t count on that.

Some ideas to get started:

  • What does getObject(i) do? I’ve seen implementations which were responsible for like 98% of the CPU time spent in supposedly “tight” loops.
  • Do getObject() and all object.getXXX() calls only once inside the loop, store them in local variables. It’s all floats, you won’t hurt the GC’s feelings.
  • In your vertex format, you can use a packed 32 bit RGB color instead of separate R, G, B floats. You can have a look at libGDX’ SpriteBatch for an example how to do this. Would reduce your buffer size from 7 to 5 floats per vertex.
  • For indices, use a ShortBuffer if you can. 50% the size, and would still be large enough to store ~5000 objects.

Thanks, mate. This is really good. I understand the calculations bit of computer science, but in practice I never knew how far to push it. I see areas of potential inefficiency, but sometimes it doesn’t seem like it is that important or I’m ignorant. I suppose what is on display here is the functional lesson of code efficiency? I’ve used redundant calculations, but usually that is in building a project so I can visually see what is doing what, with every intention of reducing that to proper variables later. But it often doesn’t bite you. I guess not until you do a great volume of something.

.getObject is effectively a .get(index) for a LinkedList element. Do .get statements like that carry effective weight too?

The idea about doing all the calls once in the loop and storing in vars is very smart, but I’m going to try ndnwarrior’s suggestion and simply manage my data better. I’m trying to come up with the right way to store all this in an array and then just pass that to the updater.

But, as you said, limit redundant calculations and calls, so I will do that in the area managing the data. The other ideas sound good too. I’ll report back what I get. Cheers ;D

Here you go. LinkedList.get() is a O(n/2) operation (the code looks the same still in Java 8 ). And you call it many, many times. This is horrible. So, changing your loop to e.g. use an iterator should give you a performance boost already.

Ah, that’s great. And it’s only compounded by the fact the system has 300 objects or more. This should indeed help a heaping load.

Do not make so many wild guesses, assumptions and beliefs. Like CoDi^R already said, measure it!
There are great profiling tools at your disposal, such as:

  • jvisualvm (built into the JDK)
  • YourKit (great profiler)
  • JProfiler (another great profiler)
    The last two profilers are commercial but have a fully-featured trial version, which you should use.
    With all three profilers, when you use the “Sampling” mode (instead of the “Instrumentation”) mode, you have zero additional runtime cost and your application runs as if it wasn’t being profiled (e.g. escape analysis still works).

This has all been a good lesson so far. Thank you for your help. I’m still implementing corrections, but, for science, I updated my updateData code from the top to match the recommendation to not use so many get calls.


		 vertices.clear();
	    elements.clear();

    	for(int i = 0; i < this.getObjectCount(); i++)
		 {  
				Object workObject = this.getObject(i);
				int xCoord = workObject.getXCoord();
				int yCoord = workObject.getYCoord();
				int xCoord2 = workObject.getXCoord() + workObject.getCurrFrameWidth();
				int yCoord2 = workObject.getYCoord() + workObject.getCurrFrameHeight();
				float tcx1 = workObject.getTCX1();
				float tcx2 = workObject.getTCX2();
				float tcy1 = workObject.getTCY1();
				float tcy2 = workObject.getTCY2();
		      vertices.put(xCoord).put(yCoord).put(rVal).put(gVal).put(bVal).put(tcx1).put(tcy1);
		      vertices.put(xCoord2).put(yCoord).put(rVal).put(gVal).put(bVal).put(tcx2).put(tcy1);
		      vertices.put(xCoord2).put(yCoord2).put(rVal).put(gVal).put(bVal).put(tcx2).put(tcy2);
		      vertices.put(xCoord).put(yCoord2).put(rVal).put(gVal).put(bVal).put(tcx1).put(tcy2);

            // unchanged code
       }

I specifically kept the .puts because ndnwarrior15 mentioned it. As CoDi^R again reminded me, don’t overlook the code breakdown of the things you are using. Untold peril lies within for ye who read it not. In this case, however, .put would seem to be just 1. I reckon it’s still less efficient than keeping up with a full array elsewhere and avoiding the object calls and work variables.

~1000 objects worked at 60fps in my bare bones environment.
~1000 objects went to 37 of max 45 in big environment (45 because of other project inefficiencies).
~300 objects did 44/45.

Still many improvements left, but this one thing was tremendous.

Can you help me with this? I wasn’t able to find exactly the SpriteBatch code for libGDX that you mentioned. But I looked into how packing works, I believe I understand that. I understand that I would then just put that number in the VBO instead. So I then add a decoder part into my vertex shader?

Packing code like so:
[icode]RGB = R<<16 + G<<8 + B[/icode]

A vertex shader like so:

#version 150 core

in vec2 position;
in vec1 color;
in vec2 texcoord;

out vec3 vertexColor;
out vec2 textureCoord;

uniform mat4 model;
uniform mat4 view;
uniform mat4 projection;

void main() {
    float r = (color >> 16) & 255;
    float g = (color >> 8) & 255;
    float b = color & 255;
    vertexColor = vec3(r, g, b);
    textureCoord = texcoord;
    mat4 mvp = projection * view * model;
    gl_Position = mvp * vec4(position, 0.0, 1.0);
}

Do I have this right?

The shader doesn’t need to be changed. The GPU takes care of fiddling with vertex formats, and delivers the color as vec4 (floats) to the shader program.

You need to change the vertex attributes for your color component. libGDX does this by providing Usage.COLOR_PACKED to the VertexAttribute instance, which then translates to glVertexAttribPointer(…, 4, GL_UNSIGNED_BYTE, …).

Then, because you use a FloatBuffer to upload data to the vertex buffer, this 32-bit RGB(A) value needs to be encoded properly when written into this buffer. libGDX does this with its Color.toFloatBits function. This basically states “hey, here’s an Integer, but from now on I tell you it’s a Float, but don’t you dare to change a bit!”

I simply could not get the 5 floats down from 7 thing to work. I got to where I could have it act almost correctly with 5 verts, but the vert colors were always ending up as 1 or 0. Values at 1 would come back as 1 on the vert color, values <1 would come back as 0. This was after many hours of effort lol.

I needed to move on, so I got the rest done with the old 7 floats form. It looks so freaking sick lol. I ran a test and ~9000 objects ran at 60fps.

Part of my problem that was revealed here but I didn’t know before is how bad LinkedList or similar .gets can be. The entire system was migrated to an array system. That was half the battle. Futhermore, creating the objects is way costlier than I expected. Now I have a pool of objects that are activated when needed in the system. It’s so creamy smooth now. ;D

Also, turns out this bugger was hiding in my full environment’s gameLoop from the earlier build:

try{
			Thread.sleep(20);
		}catch(Exception ex){}

That was the reason for a ~15 FPS drop in the main environment. Big load off my mind that it was something so small and silly.

Thanks all!