I think I'm rendering in software?

I’m having a problem with my overall rendering speed.

http://img38.imageshack.us/img38/8263/compare.gif

Basically the little billboarded tree you can see (bigger on the right) is being rendered 10,000 times (in the exact same position,) and it’s slowing my framerate to something like 10 fps (that’s a deliberate stress-test, btw.) But here’s the kicker: if I reduce the sprite’s size to 1/5th, for 1/25th previous screen area, then the time taken for it to render drops to almost nothing. (In other words, rendering time seems to be directly proportionate to the No. of fragments x No. of sprites.) I’m not calling glFinish() at any point, so why isn’t the rendering being handed over to the graphics card while my main thread does other, game-related, things? I suspect that for some reason the rendering is being done in software, but I can’t tell… How would I find out? Are there any other steps I could take here, and/or am I missing something obvious?

Thanks in advance.

the effects enabled by glAlphaFunc can cause serious performance degradation.

If you can get away with glBlendFunc, by all means make the switch - you’ll have to do your own depthsorting though.

Thanks- I just tried disabling glAlphaTest, but I’m afraid I didn’t see a noticeable difference. But that’s not my main gripe here- what I want to know is why is my GPU time eating into my CPU time? I could tolerate the slow rendering if my main thread was free to attend to game logic, but it’s apparently stuck until my rendering method returns- which leads me to wonder if it’s actually happening in software.

I think your main performance hit is that you’re doing significant overdraw and zbuffer performance is shot to hell. I once tried doing a benchmark rendering 10000 cubes in one place. The performance was terrible, but when it switched over to 10000 cubes on screen, but spread out randomly performance was much more acceptable.

Where did you hear this? I was under the impression that alpha tests were the fast, shoddy version of blending, and that blend modes can cause serious performance hits.

I experienced it first hand, then googled it and found a lot of articles explaining how it b0rks early depth culling.

Again, I have to stress that my main gripe here is that the CPU, not the GPU, seems to be doing an inordinate amount of work here. I’m not calling glFinish() anywhere, so why is GPU work slowing down my CPU thread? I’m sorry if I was unclear about this.

Look, let me explain: This is where all the rendering is being performed-


  
  private void flushGeometry(GL draws) {
    final int numVerts = vertBuffer.position() / 3 ;
    if (numVerts == 0) return ;
    if (HUD.isMouseState(HUD.CLICKED))
      I.say("\nFlushing buffer, num. verts: " + numVerts) ;
    textBuffer.rewind() ;
    normBuffer.rewind() ;
    vertBuffer.rewind() ;
    draws.glTexCoordPointer(2, GL.GL_FLOAT, 0, textBuffer) ;
    draws.glNormalPointer(GL.GL_FLOAT, 0, normBuffer) ;
    draws.glVertexPointer(3, GL.GL_FLOAT, 0, vertBuffer) ;
    draws.glDrawArrays(GL.GL_TRIANGLES, 0, numVerts) ;
  }

Now, if I literally comment out only the final line (to glDrawArrays), then the time between initiating rendering and final method return, on the CPU side, drops by a full 40 milliseconds. (That method is only called about a dozen times during rendering, I should add, since I’m essentially piling all the individual identically-textured sprites into one set of geometry buffers.) Everything before that line is (virtually) pure CPU work. So why is drawArrays eating up this massive chunk of my CPU time? Shouldn’t everything after this point be the GPU’s problem?

Sorry, I was replying to lhkbob, not to your problem.

Anyway, you are using vertex arrays - VAs (not vertex buffer objects - VBOs)

There is some overhead here, and you should only really use it for rendering ‘quite some’ geometry, not a single quad.

Further, the CPU can get quite busy while you might expect it would be idling. It depends on the drivers, but I clearly remember that CPU usage would spike whenever the GPU would be busy. It looked like some drivers were active-waiting for the GPU to get on with the next frame, like Thread.yield in a tight loop. At least for me it turned out to be the case, as I could spawn another thread, and do quite some work on it, although the CPU was already at 100% before I started the work.

As you shrink your sprite, which takes the burden off the GPU (fillrate!), your driver will Thread.yield() less, reducing the CPU load.

Keep in mind that eventough the CPU is at 100%, yielding, you can still use the majority of its processing power on another thread.

Thanks for the tip. I’ll try spawning a different thread for the rendering process and also see if I can cut down the time taken on another machine. The interesting thing is that calling glFinish does seem to increase the time taken by about 50%- would that support the active-waiting theory?

As mentioned, I’m piling lots of individual quads into a single set of geometry buffers before rendering, so glDrawArrays is pushing about 1000 polys a go. (Speaking of which, my buffer manipulation is also appallingly slow, but I can at least optimise around that…)

[quote]I think your main performance hit is that you’re doing significant overdraw and zbuffer performance is shot to hell. I once tried doing a benchmark rendering 10000 cubes in one place. The performance was terrible, but when it switched over to 10000 cubes on screen, but spread out randomly performance was much more acceptable.
[/quote]
I don’t think so- I’ve seen comparable rendering speeds for fullscreen scenes with trees spread all over the place (about 1000 sprites, plus about 1000 terrain tiles.)
http://img223.imageshack.us/img223/2117/screenpic.jpg

I guess it depends on the hardware/drivers?? BTW nice screenshots.