Well I looked into it, and apparently it does via an OpenGL ES extension. So if I had the time to drop in some optimizations the game could really benefit from a few. I’ve used FBOs in a couple places, but no VBOs.
The current state of play with the server VM on the 1.6GHz Turion64x2 and Nvidia 6150 Go:
Compiled + native Method
19.0% 8234 + 0 com.shavenpuppy.jglib.sprites.DefaultSpriteRenderer.writeSpriteToBufferF
9.4% 4089 + 0 worm.QuadTree.doCheckCollisions2
4.2% 1819 + 0 worm.QuadTree.checkCollisions
4.0% 1750 + 0 com.shavenpuppy.jglib.algorithms.RadixSort.sort
3.4% 1477 + 0 worm.features.LayersFeature.updateColors
3.0% 1316 + 0 worm.MapRenderer$RenderedTile.setTiles
2.9% 1267 + 0 com.shavenpuppy.jglib.sprites.DefaultSpriteRenderer.sort
2.8% 1216 + 0 com.shavenpuppy.jglib.sprites.DefaultSpriteRenderer.build
2.5% 1084 + 0 net.puppygames.applet.Screen$1.tick
2.1% 899 + 0 com.shavenpuppy.jglib.sprites.StaticSpriteEngine$1.processRendering
1.8% 797 + 0 net.puppygames.applet.Screen.tick
1.8% 787 + 8 worm.path.AStar.goalNotFound
1.1% 488 + 0 worm.entities.GidrahGameMapTopology.getNeighbours
1.0% 433 + 0 worm.MapRenderer.render
1.0% 422 + 0 worm.entities.Building.getCurrentAppearance
0.9% 385 + 0 worm.MapRenderer$RenderedTile.setLocation
0.7% 287 + 0 worm.MapRenderer.postRender
0.6% 258 + 1 worm.WormGameState.checkMouse
0.6% 249 + 0 worm.QuadTree.doCheckCollisions
0.6% 246 + 0 worm.path.AStar.nextStep
0.5% 222 + 0 worm.Entity.isTouching
0.5% 212 + 2 worm.QuadTree.checkLeafCollisions
0.4% 195 + 0 worm.WormGameState.tickEntities
0.3% 149 + 0 worm.Entity.update
0.3% 143 + 0 net.puppygames.applet.effects.Particle.tick
69.8% 30225 + 44 Total compiled (including elided)
Stub + native Method
6.6% 0 + 2860 org.lwjgl.opengl.ARBBufferObject.nglMapBufferARB
3.8% 0 + 1669 org.lwjgl.opengl.GL11.nglColor4ub
3.7% 0 + 1610 org.lwjgl.opengl.WindowsContextImplementation.nSwapBuffers
1.7% 0 + 746 org.lwjgl.opengl.ARBBufferObject.nglUnmapBufferARB
1.1% 0 + 477 org.lwjgl.opengl.GL11.nglTexCoord2f
1.1% 0 + 459 org.lwjgl.opengl.GL11.nglVertex2i
0.8% 0 + 328 org.lwjgl.openal.AL10.nalGetSourcei
0.7% 0 + 297 org.lwjgl.WindowsSysImplementation.nGetTime
0.7% 52 + 245 net.java.games.input.IDirectInputDevice.nGetDeviceData
0.7% 0 + 286 org.lwjgl.opengl.ARBBufferObject.nglBindBufferARB
0.5% 0 + 221 net.java.games.input.IDirectInputDevice.nGetDeviceState
0.5% 8 + 190 org.lwjgl.opengl.WindowsDisplay.nUpdate
0.4% 0 + 182 org.lwjgl.opengl.GL11.nglEnable
0.4% 0 + 155 org.lwjgl.openal.AL10.nalSourcef
0.3% 0 + 124 java.lang.System.arraycopy
0.3% 0 + 112 org.lwjgl.opengl.GL11.nglBindTexture
0.2% 0 + 83 org.lwjgl.opengl.GL11.nglTexEnvi
0.2% 0 + 82 org.lwjgl.opengl.GL11.nglDrawArrays
0.2% 0 + 78 org.lwjgl.opengl.GL11.nglEnableClientState
0.2% 0 + 74 java.lang.Object.hashCode
0.1% 0 + 60 org.lwjgl.opengl.GL11.nglDisableClientState
0.1% 0 + 58 org.lwjgl.opengl.GL11.nglDisable
0.1% 0 + 42 java.lang.Thread.sleep
0.1% 0 + 40 org.lwjgl.opengl.GL11.nglBlendFunc
0.1% 0 + 40 net.java.games.input.IDirectInputDevice.nPoll
25.4% 60 + 10950 Total stub (including elided)
That’s after 563.75 secs (49696 total ticks). Frame rate dropped to about 45fps with over 2,300 sprites and well over 120 gidrahs. Still not good enough!
There is a reasonable amount of waste going into immediate mode rendering which I will slowly remove over time and replace with triple buffered VBO code.
Collision detection seems to be taking too long - I may try switching to cell-based collisions instead of the quadtree, though this might well not really have that much of an effect. The real problem here is that so many gidrahs are packed tightly together attacking some walls in this test - perhaps I should implement a bit of that behaviour to keep them apart.
As we can see though by far the biggest single waste is writeSpriteToBuffer still, which honestly should be far faster than Java allows it to be.
Cas
hi
I am using direct mode for particle sprite rendering. I got a huge speedup with avoiding many glBegin glEnd calls.
so changing this
for all particles p:
glBegin(GL_QUADS)
glVertex(...)
glEnd
endfor
to
glBegin(GL_QUADS)
for all particles p:
glVertex(...)
endfor
glEnd
made a huge difference.
Now rendering 30 000 particles is no Problem anymore.
Only a loony would do it the first way though
Cas
it first i wanted to do it the oop stype with particle.draw. It ended in the first version and with very poor performance. Now I am happy with performant direct mode.
For better performance, try vertex arrays and vertex buffer objects.
Ofcourse, it won’t matter if the bottleneck is with fillrate, but it’s worth a try.