Performance...

Rinswind · December 8, 2003, 7:15am

What is the highest poly-count model you’ve rendered without using display lists and still got back descent performance? I’m using a PIII 667 MHZ, GeForce2 computer and when I try to render something with say 3000 textured triangles I get quite a lot of drag. I was wonderiong if my unpotimized code that slows stuff down or there just isn’t a way to get better performance
The best solution i came up with is this:
I use an interlieved vertex array stored in a direct FloatBuffer so I load the vertex/normal data for all 3000 faces with a single call go GL. Than I iterate over an array of Triangle structures (i.e. simple objects) and call glTextCoord2fv and than glArrayElement for each of the 3 angles.

Yet with these optimizations (i hope they do optimize ;P) I get a major drag…

gregorypierce · December 8, 2003, 2:46pm

What are your framerates? Which model of GF2 are you using? How much ram in your box?

Considering the machine and videocard I wouldn’t expect too much unfortunately.

aldacron · December 9, 2003, 12:24am

Still shouldn’t be too bad though. I have a p3 700 mhz with 192MB RAM and a Geforce2 MX 400. I get decent framerates on it with most games. 3000 textures tris shouldn’t be an issue.

gregorypierce · December 9, 2003, 10:44pm

Depends entirely on how they are being rendered. If they are rendered as individual triangles, with lots of state changes and such it is possible to get terrible performance on that class of hardware.

aldacron · December 10, 2003, 2:20am

[quote]Depends entirely on how they are being rendered. If they are rendered as individual triangles, with lots of state changes and such it is possible to get terrible performance on that class of hardware.
[/quote]
Of course. But he’s using interleaved arrays which should be decent.

Rinswind · December 10, 2003, 4:16am

Okay I got a GeForce2 MX400 32 Megs of VRAM and 256 Megs of system memory (the CPU i already mentioned)
Right now I improved my mesh structure to work with arrays and slots rarher than with vectors and hashes. I do as little state changes as possible - binding a texture once and than dysplaying all triangles that use it. I havent added a FPS display yet (any utils in JOGL to do that fast?) but i’d say i get about 30-40 FPS for a torus made of 5000 textured triangles. So right now I push all the vertex/normal data in one go from an array and than use the optimized texture binding algorithm. I went to great length to avoid any array/vector/hash lookup - just direct references of the type “triangle.vertex.position”.

yvg · December 10, 2003, 7:07am

Which operating system?

If Windows, do you have enabled Vertical Sync?
What is refresh rate of the monitor where you are displaying 3D canvas?

Yuri

Rinswind · December 10, 2003, 12:19pm

Yeah the OS is Windows.
The monitor refresh rate it 85 Hz and i have no vertical sync enabled. Do i do that from the GLCapabilities?
Another question:
What causes the main performance hit?

frequent state changes to OpenGL
frequent calls to GL (i.e. JNI calls)

cheers

tom · December 10, 2003, 12:45pm

I have a geomipmap terrain renderer that uses triangle strips and triangle fans. I only use vertex3f to transmit coordinates. Don’t use lighting/normals. Texture coordinates is generated. It is multitextured.

Machine: PIII 550 MHZ, TNT2, 512MB ram.
22000-34000 triangles rendered at 23-42 frames per second.

So it should be possible to use more than 3000 triangles on your machine

Btw. what performance do you get if you just just vertex3f(…), normal3f(…) etc ?

yvg · December 10, 2003, 2:17pm

[quote]The monitor refresh rate it 85 Hz and i have no vertical sync enabled. Do i do that from the GLCapabilities?
[/quote]
No, I mean setting from NVidia driver (set to “ON by default” during the installation, should be disabled for FPS measurement).

Yuri

gregorypierce · December 10, 2003, 2:23pm

[quote]I have a geomipmap terrain renderer that uses triangle strips and triangle fans. I only use vertex3f to transmit coordinates. Don’t use lighting/normals. Texture coordinates is generated. It is multitextured.

Machine: PIII 550 MHZ, TNT2, 512MB ram.
22000-34000 triangles rendered at 23-42 frames per second.

So it should be possible to use more than 3000 triangles on your machine

Btw. what performance do you get if you just just vertex3f(…), normal3f(…) etc ?
[/quote]
That 512MB of RAM is a big key for performance for you. I’m sure his application can handle more than 3k triangles, but without actually seeing the code, the textures, etc. there would be no way to tell what the problem is. I’m sure he could increase the number of triangles being rendered and see no performance loss.

To be honest, from a performance perspective the best thing to do FIRST is to render a blank screen and see what your performance numbers are. Then start adding things from there as sometimes you will find that you’re using some feature that the card supports but just doesn’t perform well doing it.

yvg · December 10, 2003, 2:26pm

[quote]To be honest, from a performance perspective the best thing to do FIRST is to render a blank screen and see what your performance numbers are. Then start adding things from there as sometimes you will find that you’re using some feature that the card supports but just doesn’t perform well doing it.
[/quote]
100% agree, and even with blank screen there are factors that affect performance - frame buffer format, # of depth buffer bits, stencil buffer, etc…

Yuri

Rinswind · December 12, 2003, 7:49am

Okay I added an FPS to the thing and it shows quite a big range of results. With the textured rotating torus (made up of 5000 triangles) I get 19 FPS while the camera looks stright at the torus hole and 33-45 when the camera looks at the torus “edge on”. I i turn off the texturing the FPS jums as high as 60-100. The same results I get with a wireframe model. The strange thing is that if I turn off the illumination calculations the textured torus gives up the same FPS
So it turns out it’s the texturing that slows me down (?)
Since I tryed to develop a general mesh structure I read the required texture from it and bind it before ieach frame (handly when there are multiple meshes on the screen ritgh?) I guess if i bind the texture once at the beginning of the program i will get slightly better results :P. I use the Animator calss to make the torus rotate.
So any of you seen such a situation?

Cheers

Orangy_Tang · December 12, 2003, 8:50am

The biggest limitation with the MX series of cards is fillrate, which is affected by the complexity of the texturing you’re doing.

Since turning off texturing gives you such a boost you’re probably fill-limited. Try reducing the colour depth or the resolution.

princec · December 12, 2003, 6:04pm

Er, Rinswind isn’t doing something mingy like this is he:


glBegin(GL_TRIANGLES);
for (int i = 0; i < numVertices; i ++) {
glTexCoord2f(...);
glNormalf(...);
glVertex3f(...);
}
glEnd();

Cas

blahblahblahh · December 16, 2003, 12:48pm

[quote]Er, Rinswind isn’t doing something mingy like this is he:


glBegin(GL_TRIANGLES);
for (int i = 0; i < numVertices; i ++) {
glTexCoord2f(...);
glNormalf(...);
glVertex3f(...);
}
glEnd();

Cas
[/quote]
For reference, what is actually wrong with that?

Anyway, the whole topic seems similar to my problems with the GF2Go - http://www.java-gaming.org/cgi-bin/JGNetForums/YaBB.cgi?board=jogl;action=display;num=1070401711 - which I never solved (although I’ve not had a chance to try requesting different colour modes yet - I’ve just been using the default IIRC).

princec · December 17, 2003, 9:34am

This is about the slowest possible way to draw in OpenGL, in any language. Here’s why:

Firstly, every single gl* call you make passes you into the driver and back again. In Java you have at least twice the overhead per method call as you have to drop down into your wrapper DLL and then on to the driver. This overhead is only 50nanos or whatever but if you do four calls per vertex for 5,000 vertices that’s 10milliseconds just in function call overhead - half a video frame!

Secondly, all the speed in modern cards comes from asynchronous AGP reads and a T&L vertex cache. When you are writing individual triangle vertices out one by one you are writing into the driver which then batches them up in 3’s and sends them on to the card. Maybe it’s clever and batches a few more up but you get the general idea. However because you have not indexed any vertices the driver will never know when you are simply repeating data, and so the hardware T&L cache (or even software cache in the driver) cannot just pluck a ready-computed vertex from memory and re-use it. You are, in effect, just going back to 1997 performance levels, because that’s how they did it in 1997 before all the whizzy 16-vertex caches came along.

So now you know

If you expect to draw anything remotely fast on any 3D hardware using any API you’ve got to use indexed, buffered geometry.

Cas