Optimisation help

Well its time to start optimising my graphics, i’ve got my lights and objects rendering nice and pretty, but performance gets terribly slow when i attempt anything other than a tiny test level. And since i’m not sure whats going to be the best option, i’m asking for favoured methods to speed things up.

First stop: the profiler.

VecriptProfile.txt: Exclusive Method Times (CPU) (virtual times)
   1923807 sun.awt.windows.WToolkit.eventLoop
    540070 net.java.games.jogl.impl.windows.WGL.SwapBuffers
    266499 com.vecript.math.ConvexHull2D.renderShadowHull
    240943 java.lang.StrictMath.acos
    197238 net.java.games.jogl.impl.windows.WindowsGLImpl.glBegin
    110363 net.java.games.jogl.impl.windows.WindowsGLImpl.glVertex3f
     79736 com.vecript.core.entity.PointLight.getPenumbraVector
     69594 net.java.games.jogl.impl.windows.WindowsGLImpl.glColor4f
     67792 com.vecript.core.entity.PointLight.getUmbraVector
     61921 net.java.games.jogl.impl.windows.WindowsGLImpl.glTexCoord2f
     38634 java.util.ArrayList.get
     31694 com.vecript.core.entity.PointLight.getDisplacedCenter
     21952 net.java.games.jogl.impl.windows.WindowsGLImpl.glBindTexture
     21752 com.vecript.math.ConvexHull2D.renderSolid
     19150 net.java.games.jogl.impl.windows.WindowsGLImpl.glEnd
     18082 com.vecript.math.Vector2f.angle
     13812 net.java.games.jogl.impl.windows.WindowsGLImpl.glDisable
     11543 com.vecript.core.renderer.GameRenderer.drawGeometryPass
     10276 com.vecript.math.ShadowFin.renderFin
      9875 net.java.games.jogl.impl.windows.WindowsGLImpl.glEnable
      7607 com.vecript.core.renderer.GameRenderer.mergeShadowHulls
      7006 java.util.prefs.WindowsPreferences.windowsAbsolutePath
      5338 java.util.prefs.WindowsPreferences.WindowsRegOpenKey
      3603 java.util.prefs.WindowsPreferences.WindowsRegQueryValueEx
      2602 net.java.games.jogl.impl.windows.WindowsGLImpl.glClear
      2602 java.security.AccessController.doPrivileged
      1802 com.vecript.core.renderer.GameRenderer.findVisibleLights
      1735 com.vecript.core.renderer.GameRenderer.render
      1735 com.vecript.core.Shader.bind
      1535 com.vecript.core.DebugConsole.setFramesPerSecondText
      1468 net.java.games.jogl.impl.windows.WindowsGLImpl.glLightfv
      1468 com.vecript.math.Circle.renderLightAlpha
      1401 java.lang.Thread.currentThread
      1268 java.lang.Math.acos
      1268 java.awt.image.ComponentColorModel.getDataElements
      1201 java.lang.StringBuffer.<init>
      1134 java.util.prefs.WindowsPreferences.toWindowsName
      1068 java.lang.FloatingDecimal.dtoa
      1068 java.lang.StringBuffer.toString

The above was grabbed with the -xprof options for the VM, and HP’s handy-dandy JMeter to extract the information. Now the most obvious here are the ConvexHull and PointLight methods, as well as immediate mode gl calls sticking out like a sore thumb. (Ignore .SwapBuffers, i’m running at ~16fps so thats probably a red herring).

At the moment the light and shadow rendering is the bottleneck, with the shadow geometry being built on the fly every frame and rendered in immediate mode. Shadow generation is quite lengthy, and at the moment is a brute force with no attempt to cull out any geometry. Rendering requires a z-fill & ambient pass, then a pass for every light.

Optimisations that spring to mind are:
[]Some sort of spatial tree. Pretty obvious since i should be able to cull both invisible geometry and geometry out of a lights range in each pass.
[
]Getting rid of immediate mode in favor of something better. Plain old vertex arrays are probably going to be the best bet (no point caching results in vRam if they’re going to change all the time).
[]Some method of caching shadow geometry. Tricky, since every geometry-light pair has an associated shadow geometry. I can’t think of a good way to do this that actually sounds like it’d be faster.
[
]Optimise my shadow geometry creation. Ugh, probably least favorite. The method is non-optimal in terms of new’d objects, but its some complicated maths that i don’t like the idea of obscurificating it…
[*]Something else?

Spatial tree sounds good, but its more the getting rid of immediate mode i’m worried about - how on earth do I create this geometry on the fly in a nice and efficiant manner? Have one big buffer and fill it up as needed? Shadow geometry is rendered half without textures and half using a single texture, would a buffer for each be a good idea? It seems like i’d have to make it much bigger than needed and waste memory to get something efficiant…

Any pointers appreciated :slight_smile:

That "java.lang.StrictMath.acos " isn’t helping matters.
What are you using it for?

I mean if you are creating new rotaiton matrices every frame for rotation that is one thing (but still can be minimizd ) but why “StrictMath” ?

Some Math methods delegate to StrictMath maybe this is one of them?

Excerpt from java.lang.Math:

/**

* A result must be within 1 ulp of the correctly rounded result. Results
* must be semi-monotonic.
*
* …
*/

public static double acos(double a)  {
  return StrictMath.acos(a); 
   // default impl. delegates to StrictMath
}

(J2SE) 1.4 math is slower than the corresponding routines in J2SE 1.3.1.

You can use JNI to speed up math calculations, as described in
http://www.javaworld.com/javatips/jw-javatip141.html

Lemme see… Math.acos is only called from Vector2f.angle(), but that is used with the shadow generation. Since its used to find relative light-occulder angles, its not something that can easily be cached (and ideally i’d just cache the whole shadow generated). Its only used 4 to 8 times per shadow depending on the positions.

So from the looks of things i need to either draw faster, or draw less and reduce the shadow calculations that way instead of trying to get the actual individual calculation faster.

Please to not interpret java.lang.Math methods literally. As far as I understand, default implementation of them (one contained in src.zip) just delegates them to StrictMath - but hotspot can replace Math calls with specialized instruction - probably just replacing it with single fpu instruction (as it does not have to be really strict).

Now, I know that 1.4.x has some problems with Math performance - but this has nothing to do with what is inside Math.java class.

BTW, if you can just remove all your math and render calls completely, you can really get your frame rate up there! :wink:

“Ummm, yeah… if you could just remove the math and render calls, that’d be greaaat. Thanks. Oh, and I’m gonna need you to come in on Saturday…”

So, er, no one has any practical hints on batching triangles with vertex arrays and byte buffers? Or ideas on how to cache results in some efficiant manner? Or am I going to get suggestions like ‘use a LUT for cos’? :o

For static geometry, you need Vertex Buffer Objects. They are the fastest thing going for unchanging geometry. but JOGL does not support them without modifications. That is why the first thing I did was rebuild JOGL WITH support for VBO’s (with much help from abies)

Also, use vertex arrays whenever possible, they are faster for dynamic geometry than just calling glVertex3f 500 times.

Bascially, get out of calling loops with gl calls in the them. The fastest is to move big blocks of geometry at a time with arrays and VBO’s

A quick Google on “glVertexPointer example” yielded
http://www.movesinstitute.org/~mcdowell/mv4202/notes/lect14.pdf
Which a pretty darn good explanation - in PDF form. :slight_smile:
You can get to the HTMl view from the Google search.

Yeah, this was what i was hoping to get hints on, how people actually practically use vertex arrays for large amounts of dynamic geometry, particularly since the actual amount per frame may change by quite a bit depending on whats visible. VBOs aren’t going to help in this case i think since the geometry is not really static.

Am i just going to have to bite the bullet and have a whopping great big byte buffer reused every frame? I feel like i’m going round in circles - everyone screams ‘use XYZ!’ when i’m asking ‘how?’. :-/

Edit: and when I say ‘how?’ I don’t mean just the gl spec, I mean in a way thats actually practical, such as how to organise byte buffers, size of, number, etc.

If the number of elements changes from frame to frame, you will have to do like particle systems do and just make a maximum vertex array that you reuse up to the maximum each time.
When you glDrawElements, you can say where to start and stop in that array.
That is a fine way to do it :slight_smile:

Don’t know if this might help…But if math crunching seems to be
one of the bottlenecks, then you might want to try the
server option - I’ve been able to get a 2X speedup even with some
fairly complex cases.

[quote]If the number of elements changes from frame to frame, you will have to do like particle systems do and just make a maximum vertex array that you reuse up to the maximum each time.
[/quote]
The more i think about this, the more this seems like the only option. It still feels slightly wrong though, but if its a common method i guess its just a speed vs. memory trade off…

[quote]For static geometry, you need Vertex Buffer Objects. They are the fastest thing going for unchanging geometry.
[/quote]
I have heard that nvidia drivers do not optimize them as much as it should and VBOs end up being slower that display lists. Is it still a case ?

VBOs might very well be faster than ordinary arrays and immediate mode. Using VBO mappings you could end up with AGP memory saving yourself the extra copy, just like NV_vertex_array_range.

Regarding the acos: Do you really need the angle and not simply cos(angle)? In which case you can get by with a much cheaper dot product.

  • elias

Well i’ll experiment with VBO when its in a proper Jogl build, for now anythings gonna be faster than immediate mode :slight_smile:

For acos, i’m not sure. Heres the actual bit of code:

      public float clampToEdge(Vector2f edgeVector)
      {
            // Find angle between fin and edge.
            // If overlapping edge, clamp to and return the intensity the new umbra edge lies on
      
            // If angle between penumbra and umbra is greater than penumbra and face, have intersected
            double penVsUm = penumbraVector.angle(umbraVector);
            double penVsEdge = penumbraVector.angle(edgeVector);

            if (penVsUm > penVsEdge)
            {
                  // Find ratio between angles and calc new umbra intensity
                  float ratio = (float)(penVsEdge / penVsUm);
                  
                  umbraIntensity = 1f - ratio;
                  umbraVector.set(edgeVector);
                  
                  return umbraIntensity;
            }
            else
                  return 0f;
      }

Now I only really need the ratio of the angles, and the dot product did originally spring to mind. However the vectors aren’t guranteed to be normalised, so i’m not sure if the dot product would actually work. But first i’ve got to get my basic quad tree working to see how much that helps…

If you look in the Vector.angle code, you’d probably find that it normalizes the vector…

  • elias

I just said “Dot products work on any vectors, not just normalised ones.” but realised that’s completely untrue. So ignore me.

Cas :slight_smile:

Elias is right, although the .angle leaves the vectors unchanged, it divides though by the length so its effectily doing this same thing… I changed the .clampToEdge to instead normalise all the vectors beforehand and use the dot product instead.

Profiling this version without acos showed that the time spent in the shadow generation was cut approximatly in half :slight_smile: However the framerate seemed unchanged at ~16fps still.

I also added a quadtree to cull out unneeded geometry. This not only reduces the number of light passes per frame (good, but not effective in the test level, and only 7 lights anyway so not important here) but also the amount of geometry casting a shadow from each light. The profiler suggests that i’ve cut the time calculating shadows down by a quater with this on, which is about right considering that most lights cover about a quater of the test level on avarage.

Annoyingly, enabling the culling only sees an increase from ~16fps to ~18fps. Not the large increase i was hoping for. Current profile with both these in effect:


VecriptProfileNoCosFullCull.txt: Exclusive Method Times (CPU) (virtual times)
    631997 sun.awt.windows.WToolkit.eventLoop
    532977 net.java.games.jogl.impl.windows.WGL.SwapBuffers
     16998 com.vecript.math.ConvexHull2D.renderShadowHull
     12195 net.java.games.jogl.impl.windows.WindowsGLImpl.glBegin
      5990 com.vecript.core.entity.PointLight.getUmbraVector
      4371 net.java.games.jogl.impl.windows.WindowsGLImpl.glVertex3f
      3346 com.vecript.core.entity.PointLight.getPenumbraVector
      3238 net.java.games.jogl.impl.windows.WindowsGLImpl.glTexCoord2f
      3022 java.util.ArrayList.get
      2968 com.vecript.math.ConvexHull2D.intersectsRect
      2914 com.vecript.core.spatialtree.QuadTreeBranchNode.findVisibleObjects
      2860 com.vecript.core.entity.PointLight.getDisplacedCenter
      2644 java.util.prefs.WindowsPreferences.WindowsRegOpenKey
      2104 com.vecript.math.ConvexHull2D.renderSolid
      1997 net.java.games.jogl.impl.windows.WindowsGLImpl.glColor4f
      1511 java.lang.Thread.currentThread
      1349 net.java.games.jogl.impl.windows.WindowsGLImpl.glDisable
      1295 java.util.prefs.WindowsPreferences.WindowsRegQueryValueEx
      1025 net.java.games.jogl.impl.windows.WindowsGLImpl.glBindTexture
       971 java.util.prefs.WindowsPreferences.windowsAbsolutePath
       917 java.security.AccessController.doPrivileged
       917 net.java.games.jogl.impl.windows.WindowsGLImpl.glEnable
       917 com.vecript.core.renderer.GameRenderer.render
       809 com.vecript.math.ShadowFin.renderFin
       701 java.util.prefs.WindowsPreferences.toWindowsName
       701 net.java.games.jogl.impl.windows.WindowsGLImpl.glClear
       701 java.lang.ClassLoader.defineClass0
       648 com.vecript.core.spatialtree.QuadTreeBranchNode.findVisibleLights
       594 com.vecript.core.renderer.GameRenderer.drawGeometryPass
       540 net.java.games.jogl.impl.windows.WindowsGLImpl.glEnd
       540 java.lang.StringBuffer.toString
       540 sun.awt.windows.WGlobalCursorManager.findHeavyweightUnderCursor
       540 java.awt.image.ComponentColorModel.getDataElements
       486 java.lang.String.substring
       432 java.io.FileOutputStream.writeBytes
       432 net.java.games.jogl.impl.windows.WindowsOnscreenGLContext.swapBuffers
       432 net.java.games.jogl.impl.GLContext.invokeGL
       378 java.lang.StringBuffer.<init>
       378 com.vecript.core.renderer.GameRenderer.mergeShadowHulls
       324 java.util.prefs.WindowsPreferences.WindowsRegCloseKey
       324 net.java.games.jogl.impl.windows.WindowsGLImpl.glLightfv
       324 com.vecript.core.spatialtree.QuadTreeRoot.findVisibleObjects
       270 java.lang.String.toCharArray
       270 com.vecript.core.renderer.GameRenderer.findVisibleObjects
       270 java.util.ArrayList.add

One final test, on top of all this, I commented out the code to actually draw the shadows (but with all the calculations left in) to see what effect all the glBegin etc. calls were having. The fps then jumped to ~30fps from the previous ~18fps. Although 30fps is still slower than i’d like, I’m hoping that switching to vertex arrays is going to show this same kind of speed increase.

Anyone any other ideas?

Sounds like you’re on track. :slight_smile:
Careful analysis and chipping away at the stone…