Slow Vertex Arrays - NIO Evil??

By the way… this is a slightly different question, but it’s on the same subject. I have my program loading ms3d models into vertex arrays / vertex buffer objects (it does both, it just checks first) but I’m curious what’s the best way to handle skeletal animations.

I don’t think I should be running through my entire array every frame and rewriting the position of each vertex. It’s bad enough to do that with regular arrays, but doing it with put() seems slightly disingenuous. OpenGL handles transformations really well when you use rotatef/translatef etc. It’s a hell of a lot faster than using my own matrix calculations anyway, but I don’t know how to apply it to my model as it’s being rendered. Do I just need to keep track of how the vertices have been transformed, and just push/pop transformations as I render them? Or is there some way to use opengl for a transformation and have it actually return the transformed vertices?

You must be using one large block of ram and then splitting it up. In a 2D game that is fine, but in a 3D game, there is way to much data, unless you force everyone into a minimum of 128MB cards or make them deal with major thrashing…one large detailed terrain alone could thrash a 16MB card :slight_smile:

Fortunetly, there are lots of cases in a 3D engine where the only thing that changes is the data, not the state. Models typically consist of hundreds to thousands of vertices using the same format for textures, colors, normals etc.

As a simplified example - picture a model of a sphere composed of 16x16 segments with colors and texture (default modulate), done with floats, rendered as triangles.

  • 1536 vertices
  • 1536 normals
  • 1536 colors (rgb)
  • 512 texture coordinates

Total amount of data that must get to the card to render a single frame: 17920 bytes (4 bytes per float, and exluding texture upload)

Immediate mode calls necessary: 6144
GL pointer calls with glDrawArrays: 13 including state changes

This is why I am confused - The JNI savings alone are huge and the data comes in a pointer to an unsegmented block of RAM that is already in a format ready for the native GL calls.

This bit is only mentioned in the context of glDrawRangeElements, however it might be equally valid for glDrawArrays. Maybe you can try limiting the amount of data you are passing by doing multiple calls to glDrawArrays taking that GL_MAX_ELEMENTS_VERTICES value into account. I don’t have a clue about all the memory transfers involved, but it might be worth a try :slight_smile:
[/quote]
That shouldn’t matter in this case. I am well below any limits and I am not using glGet.

Off topic but – basically push and pop is what you need for each group as they are transformed from their parents coordinate system. To do full skelatal is a more complicated then that though and will require some inverse kinematics

ok… should I use glDrawRangeElements for that? Because a bone may influence only part of a mesh.

I don’t need to do anything fancy with the skeleton stuff, since MS3D lets you model animations yourself and just specify keyframes. It gives you rotation/position keyframes for all the joints, and you can just interpolate the frames in between. My main issue is doing it quickly, and applying my own transformations in Java (which I could have done) is just slow as molasses.

It all depends on how the data for your models is allocated. If all the data for the model is in a single buffer, then yes, you will want to use the finer grain calls to operate on subsets of the vertices…if all the data for ALL your models is allocated in a single buffer, you’ll get the thrashing I described above unless the user has an ubber card and you’ll have no choice but to use the fine grain calls.

There is probably some overhead in enableing/disableing the client state and calling glDrawArrays. Have you tried using vertex arrays only when there are many verts, like atleast 500. Can you find a breaking point where vertex arrays are faster than immediate mode?

It migh also be the case that transfering the vertices to the card is not the bottleneck. That when using immediate mode you are feeding the card at exactly the right pace. Still strange that vertex arrays are slower.

If I were in your situation I would start making a whole lot of tests to find out what was going on.

[quote]if all the data for ALL your models is allocated in a single buffer, you’ll get the thrashing I described above unless the user has an ubber card and you’ll have no choice but to use the fine grain calls.
[/quote]
I’m creating a buffer for each mesh, since vertices are not shared between meshes. I still have a big problem though, since knowing how to divide up the vertices is not the same as dividing up the triangles. There can be any number of triangles in a mesh which are composed of vertices bound to separate joints, meaning they stretch when the bones move, and don’t just rotate/translate. The only way to do this is transform some of the vertices and not others, so you can’t really break these triangles into ‘groups’ and render them all at once. If I push/pop matrices for every transformation, I’d have to do so multiple times for each of the triangles which share these vertices.

I’ve been searching around all over the place for an elegant solution to this problem. With every major video game using skeletal animation these days, I cannot believe they’re doing all the transformations in software. What are they doing differently?

I’m creating a buffer for each mesh, since vertices are not shared between meshes. I still have a big problem though, since knowing how to divide up the vertices is not the same as dividing up the triangles. There can be any number of triangles in a mesh which are composed of vertices bound to separate joints, meaning they stretch when the bones move, and don’t just rotate/translate. The only way to do this is transform some of the vertices and not others, so you can’t really break these triangles into ‘groups’ and render them all at once. If I push/pop matrices for every transformation, I’d have to do so multiple times for each of the triangles which share these vertices.

I’ve been searching around all over the place for an elegant solution to this problem. With every major video game using skeletal animation these days, I cannot believe they’re doing all the transformations in software. What are they doing differently?
[/quote]
Skinning can be done in vertex shaders. I’m guessing that is what most games are doing. But of course, it also provides a path that does it in software if the card don’t support vertex shaders.

So you need to provide a software option. You just have to optimize the code to make it as fast as possible. This is one case where java is slower than c/c++, but still adequate.