Why is/are my VBO(s) so slow?

I am rendering around 5000 objects. My mesh class (http://pastebin.java-gaming.org/b35985a9848) which loads .obj files and creates a VBO for them is only instanced once (it loads Cube.obj), then the draw method is called whenever one of these object’s update method is called. I gained a bit of performance by moving the buffers to the objects and only updating them if the objects position/orientation changes. Am I just doing it wrong?

Also if you could tell me how to change the loading code from BufferedReader(Faster for line by line) to Scanner (Faster for parsing).

How slow?

On my test rig (Testing on low end first)
AMD Athlon 2 1.6Ghz
Radeon x1200
3GB RAM
6-10 Frames per second (Using my own and TWL’s counter)
When profiled with netbeans (Game loop commented to not loop) glDrawArrays is 1056 ms (51% of the loop time).
Edit: I run at 60FPS when glDrawArrays alone is commented out.

My main computer, built for gaming runs it at the capped 60 (3Ghz Phenom II, Radeon HD 4870, 4GB RAM).
It does have shaders but it’s a simple projmodelview*vertex glFrag=1,1,1.
I would like to run this on low end systems better. I’m not very good at optimization.

Your GPU is really slow. If that’s the bottleneck there’s not much you can do except draw less. Luckily (?) your CPU seems quite slow too, so hopefully that’s the bottleneck at the moment.

I think you’ve misunderstood how vertex array objects work. glEnableVertexAttribArray() is also supposed to be stored in the VAO when you create it, not enabled and disabled manually each time you render.

Since you’re drawing lots of the same object, you should bind the vertex array only once, then call glDrawArrays() as many times as you need instead of rebinding it before each draw.

May I see the code that calls Mesh.Draw()?

I can’t see how that code will work in the first place :stuck_out_tongue: stride and offset = 0 for all vertex attr pointers?

Besides that, I don’t see any obvious flaws. Show more code :slight_smile:

I assume cube.obj only contains 8 vertices?

Matter class: http://pastebin.java-gaming.org/3598a689842
This is what each object is, and where the mesh draw is called from.

Camera class: http://pastebin.java-gaming.org/598a8789247
Projection and view buffers

Main class: http://pastebin.java-gaming.org/98a8882974e
Did I do depth and culling correctly?

Matter extends Updatable which extends Entity, which gives the objects children/parents; a hierarchy structure.
When update is called on the root object it recurses through the family tree.

I am not absolutely sure of how vertex buffers even work, just how to use them in a way.

Edit: Moved glEnableVertexAttribArray into the VBO instead of every draw, not much different but works. Thanks to agentd.
Also where is layman’s documentation on what stride and offset do?

He’s using one VBO per vertex attribute, so passing in stride=0 and offset=0 is valid. They’re only relevant when having multiple attributes packed in a single VBO.

  • Stride is the byte distance between each element. Here you’d pass in how many byte each vertex takes. Passing 0 tells OpenGL that the data is tightly packed.
  • Offset is simply the first byte index OpenGL should start reading from.

For a VBO only containing a vec3 position and a 32-bit RGBA color for each vertex, you’d set up the attributes like this:


int vertexSize = 3*4 + 4;
glVertexAttribPointer(positionLocation, 3, GL11.GL_FLOAT, false, vertexSize, 0);
glVertexAttribPointer(colorLocation, 4, GL11.GL_UNSIGNED_BYTE, true, vertexSize, 3*4);

Back to Tyecon’s code then…

Your code might indeed by CPU limited. How many times is calculateModel() called in Matter? The fact that calculateModel() is called from setPosition/Rotation/Scale() is a bit worrying. That code might be run 3 times per update or more if you move your objects around a lot.

A simple test you can do to determine if you’re CPU or GPU limited is to simply comment out glDrawArrays() in Mesh.Draw() but leave everything else intact. If the program runs at almost the same speed as before, you’re CPU limited. If you get a big FPS boost, you’re GPU limited.

[quote=“theagentd,post:7,topic:41342”]
This is correct but will confuse everybody new to this problem.

The stride is the distance between the start of the elements. You can find each element with this formula:


attributePointer = attributeOffset + stride * vertexIndex;

A stride of zero, would therefore be invalid as the pointer would never move along the vertices, but the driver treats zero as a special value, and will calculate the actual stride.

Cool. Now I learned something too :slight_smile:

Thanks. I wasn’t sure how to explain that properly so thanks for filling in, Riven. ^^

IMHO the © specification of that method is really poor, as ‘a stride of zero means the data is tightly packed’ implies that stride means the space between data of the same type (either vertices, texcoords, normals, etc), which is clearly not the case.

* Riven is not impressed. :cranky:

Ok so it’s my GPU not my code that’s causing it to be slow? (Commenting out glDrawArrays brought it up to the 60 cap)
Nice to know. Also nice explanations of vertex buffers. So with vertex buffers you can calculate exactly how many bytes you want to buffer to the shader, and separate them into attributes?

I think it’s safe to assume that it’s GPU limited. Too bad I guess, since that’s much harder to optimise it. You should disable Vsync though so you can measure uncapped FPS values.

One thing you can try is to reduce the screen resolution. If that improves the FPS it means that the rendering is fragment limited (AKA fill rate limited), so using a lower resolution will increase performance. If there’s no difference, your program is vertex limited (you’re drawing too much small geometry). In that case the only solution is to reduce the number of vertices, e.g. reduce the number of cubes.

That ObjLoader has so many bugs, it’s not funny. I saw all those bugs before in a YouTube video, plugged here on JGO… where the author wondered why array-indices were zero based.

Anyway, ditch that ObjLoader, or take some time to fix it.

  • Indices are expressed in floats?!
  • Texcoords are treated completely differently from vertices and normals (you’re lucky you don’t get native crashes, as there are buffer overruns in the driver)
  • Parsing the input data is incredibly inefficient… RegExp… really? Doing the same regexp-splits over and over again.
  • Inability to handle anything else than faces with 3 vertices
  • Inability to handle negative indices (local to group)

Ok, at native resoultion it is the same frame rate so it’s just the video card being weak, not me. :wink:
I only used that ObjLoader because it was the only one I could find that worked, and I have no idea how to do it myself (What’s an indice?). I do not have much experience with this stuff (learning from trial and error, which means if it works I use it). This forum is friendly, quick, and helpful; I think it will become one of my go to places. Thank you. :slight_smile:

You can find the specification for .obj files on Wikipedia and elsewhere.

ObjLoader

Usage (immediate mode)


String obj = ...;
List<Face> faces = ObjLoader.load(obj);

// ensure all quads (or triangle fans) are converted to plain triangles
faces = ObjLoader.asTriangles(faces);

// optional, (re)calculates normals by finding other triangles having shared indices
ObjLoader.calcNormals(faces);


glBegin(GL_TRIANGLES);

// render as a whole
for (Face face : faces) {
	for (Index index : face.indices) {
		glNormal3f(index.n.x, index.n.y, index.n.z);
		glTexcoord2f(index.t.x, index.t.y);
		glVertex3f(index.v.x, index.v.y, index.v.z);
	}
}

// render per group
Map<String, List<Face>> groupToFaces = ObjLoader.splitByGroup(faces);
for (Entry<String, List<Face>> entry : groupToFaces.entrySet()) {
	for (Face face : entry.getValue()) {
		for (Index index : face.indices) {
			glNormal3f(index.n.x, index.n.y, index.n.z);
			glTexcoord2f(index.t.x, index.t.y);
			glVertex3f(index.v.x, index.v.y, index.v.z);
		}
	}
}

glEnd();

have a look at gDEBugger http://www.gremedy.com/

GL15.glBufferData(GL15.GL_ARRAY_BUFFER, interleave, GL15.GL_STATIC_READ);

the GL_STATIC_READ might cause the memory manager to use AGP or system memory, try GL_STATIC_DRAW.
page 11, https://developer.nvidia.com/sites/default/files/akamai/gamedev/docs/Using-VBOs.pdf?download=1