Benchmarks for 3D Scene Graph APIs

tensei · April 3, 2012, 5:14pm

Time for benchmarks!

I have written six performance benchmarks, that I have implemented in Java 3D, Ardor3D and jMonkeyEngine3. These benchmark tries to test various aspects of the 3D scene graph APIs.

In short, there are benchmarks for dynamic geometry, frustrum culling, rapid add and removal of nodes to the scene graph, picking, state sorting, and transparency sorting.

I have tried as best as possible to implement the benchmarks identical in each of the APIs. The implementation uses a base class, which contains all the core functionality for the benchmarks. Such as collection of data, writing to file, increasing number counts etc. (Ardor3DBenchmarkBase.java, Java3DBenchmarkBase.java, and Jme3BenchmarkBase.java). For each benchmark, these base classes are extended, so that only benchmark specific code is necessary.

The benchmarks all start at 32 objects, then increase by the power of two. I have normally ran each benchmark up to 16384 or 8192 objects, but lower object counts have been chosen for some of the benchmarks due to problems with some of the APIs (more on this later).

The execution of the benchmarks are separated into two periods: First there is a warmup period, in which no data is collected. This is to give time to initialize. The duration can be specified both in terms of seconds, and frames. Whichever is longest will dictate the duration of the warmup period. For my recordings I used a warmup time of 2 seconds, and 3200 frames (3600/60 = 60). The warmup period is followed by a data collection period. At the end of the data collection period, the data is written to output files.

The benchmarks collects data per frame. The time per frame (tpf) is calculated, spikes in the tpf (here a threshold is used, and a spike is defined as an increase in tpf >= 40%), and the current memory usage (taken from the heap, total minus free). In addition to this, additional statistics are calculated, such as the total time, average values (tpf, fps, memory). It is also registered how many of the frames lie within different tpf groupings (ie. 0.1ms >= x < 0.5ms). Nanoseconds are used when calculating the statistics for precision, however this is converted to milliseconds in the output files (for readability).

Each benchmark produces five output files. One file that is meant to be read by humans, called BenchmarkName_APIName_human_readable_results.txt, which contains summary information. Then there are four csv files created; one containing general results, which basicly is the human readable output in csv format (BenchmarkName_APIName_general_results.csv). Second csv file contains the tpf for each frame in milliseconds (BenchmarkName_APIName_tpf_results.csv). Third file contains the memory for each frame (BenchmarkName_APIName_mem_results.csv). Last csv file contains the results regarding which tpf sections the frames belonged to (BenchmarkName_APIName_tpfsections_results.csv). Entries in the csv files are separated by comma (,), and each period (object count) by newline.

Note that it is hard to measure the memory usage in Java. However by monitoring the heap, it is possible to extract some data. When plotting the recorded memory data into a graph, in most cases a pointer towards the memory usage is given by looking at the lowest values recorded. As the graphs increases, it shows the garbage created by the APIs. A sudden drop, would show us what is removed by the garbage collector, which in turn gives us the new base usage.

In order for the code to be identical, I use custom geometry in the benchmarks. The underlying geometry used is a ported version of the gluSphere, using triangle strips (Sphere.java), with an implementation on top in each of the APIs (Ardor3DSphere.java, Java3DSphere.java, and Jme3Sphere.java). In addition to the spheres, some of the benchmarks use cubes, where indexed triangles are used (Box.java) (Ardor3DBox.java, Java3DBox.java, and Jme3Box.java).

The benchmarks were run on the following specifications:

ASUS GeForce GTX 580 1536MB

Using GeForce 295.73
Settings are default, except force vsync off

Intel Core i7 2600K Quad Prosessor 3.4GHz
Kingston HyperX 8 GB 1600MHz DDR3
ASUS P8Z68-V PRO, Socket-1155 ATX
Windows 7 Professional 64-bit, Service Pack 1
Java™ SE Runtime Environment (build 1.7.0-b147)
Java HotSpot™ 64-Bit Server VM (build 21.0-b17, mixed mode)

I have imported each of the csv files that is generated from the benchmarks to excel, and created the graphs there.

I have created runnable jars, with a starter GUI so that it is easy for you to replicate the benchmarks on your own. Just run them with java -jar benchmark.jar.

ardor3d-runnable-benchmark.zip
url=http://www.ia-stud.hiof.no/~thomasw/java-benchmarks/old_ardor3d-runnable-benchmark.zip ardor3d-runnable-benchmark.zip[/url]
java3d-runnable-benchmark.zip
jme3-runnable-benchmark.zip
url=http://www.ia-stud.hiof.no/~thomasw/java-benchmarks/old_jme3-runnable-benchmark.zipjme3-runnable-benchmark.zip[/url]

The source code can be downloaded here, the necessary libraries can be obtained by downloading the runnable jars linked to above (they are not included with the source):

DynamicGeometry benchmark:
Consists of rotating spheres with triangle counts between 58 and 218 tris. Vertices are dynamicly changed, by shrinking/expanding the radius during runtime. Screenshot can be seen here.

Source:

I ran this benchmark from 32 to 512 spheres. Higher object counts were especially slow in jME3. The tpf in all APIs were low, with best results in jME3 and Ardor3D. There are however some major issues with these two APIs in this benchmark. At fixed intervals their tpf spikes, especially long for jME3. The longest spike in jME3 lasted 21 seconds(!), and this is with 512 spheres. Ardor3D spikes similarily, however several orders of magnitude less than jME3. The longest spike for Ardor3D were at 0.14 seconds. The reason for the spikes are unclear, especially the really long ones. I would initially think this would be time spent creating static lists for example, however the duration is very extreme. Looking at the memory overview for Ardor3D and jME3, there is a correspondence between the spikes, and a drop in the memory garbage created. I am not sure if this is a mere coincidence, or if there is something going on here? This can be invesitgated by looking at the memory overview per frame, and the tpf for each API.

I ran this benchmark from 32 to 8192 spheres. The results show that jME3 is the fastest of the three. Java 3D has the second best tpf, followed lastly by Ardor3D. We see that all APIs has a relatively stable tpf, however there are large spikes at regular time intervals with Ardor3D. The spikes corresponds with drops in the base memory usage, and might be connected to deletion of buffers (something similar to what were the case with jME3?).

Graphs (images):
TimePerFrame2048_full_all
Memory2048_all
Memory2048_partial
Spikes_all
Total Time
AvgFPS_partial_all
AvgFPS_all

The results can be downloaded here: DynamicGeometry.zip.

Frustrum benchmark:
A cluster of rotating cubes (one cube = 12 triangles) is moved along an elliptic trajectory, that takes it mostly outside of the far plane, and fully behind the near plane. This should effectively force frustrum culling of objects outside of the frustrum. Some screenshots along the path: 0, 1, 2, 3.

Source:

Benchmark were run from 32 to 16384 objects. Results indicate effective culling in all APIs. jME3 is the fastest, followed by Ardor3D, and then Java 3D.

Graphs (images):
TimePerFrame16384_full_all
Memory16384_all
Spikes_all
Total Time
AvgFPS_partial_all

The results can be downloaded here: Frustrum.zip.

NodeStressAddAndRemoval benchmark:
Cubes (one cube = 12 triangles) are added to, and removed from the scene graph every frame. This is to look at how they handle and scale add/remove operations to the scene graph. Screenshot can be seen here.

Source:

The results show spikes in the tpf for all APIs, but this is expected. Ardor3D had the best tpf, with the smallest spikes. The tpf in jME3 is similar to that of Ardor3D, however it spikes a little higher (~3.5 ms). Java 3D is much slower than both APIs (diff in spikes: ~54ms). The amount of spikes seem to stabilize for Ardor3D and Java 3D at higher object counts, indicating that their algorithms scale well. The number of spikes in jME3 only increases.

Graphs (images):
TimePerFrame2048_full_all
TimePerFrame2048_partial_all
Memory2048_all
Spikes_all
Total Time
AvgFPS_partial_all
AvgFPS_all

The results can be downloaded here: NodeStressAddAndRemoval.zip.

Picking benchmark:
Consists of rotating spheres with triangle counts between 218 and 478 tris. Every frame 81 rays are picked into the scene, with origin at the camera. This is to test how well the APIs handle rapid picking operations. Picking is done at the primitive level. Screenshot can be seen here.

Source:

The results show that Java 3D was the fastest API in this benchmark. It had an avg tpf of 31 ms at 4096 objects, opposed to jME3 who had an avg tpf of 91 ms. Ardor3D were the slowest of all the APIs, with an avg tpf of 176ms. The results for Ardor3D have been ommitted from this test. There seems to be some serious problems when running picking at the primitive level. At 64 objects it used 20 minutes to complete the benchmark, with an average tpf of 156 ms. No higher object counts were run. It is unclear why it is so slow, looking at the source code it seems to do everything right, with a per bounding volume check first, then for affected objects per primitive checking is done.

Edit: After setting the maxElements for the CollisionTreeManager to appropriate values I were able to run the benchmark for Ardor3D as well, as suggested by renanse. This was due to a default max value of 25 being used prior.

Note: Due to incapabilities with Java 7, I had to run this benchmark in jME3 with the option “-Djava.util.Arrays.useLegacyMergeSort=true”. Look at RFE: 6804124 here.

Note2: In the output files the number of picks are written, and the valued varies between each of the APIs. I have verified that the actual number of hits are correct, however the counting done in the benchmarks does not take into account additional results hidden in layers beneath.

Graphs (images):
TimePerFrame4096_full_all
Memory4096_all
Spikes_all
Total Time
AvgFPS_partial_all
AvgFPS_all

The results can be downloaded here: Picking.zip.

StateSort benchmark:
Consists of rotating cubes (one cube = 12 triangles). Different states are used for series of cubes, varying between the three main states (lighting, texturing and shaders). For example with 32 cubes: 32 cubes / 3 state types = 10.667. Then the square root of this is used as the number of different state permutations within each state type. As the number of objects increases, the number of different states increases. This tests which APIs have the best state sorting algorithms. The textures used are generated during runtime. The shader programs simply take a uniform color that is set in the fragment shader. Screenshot can be seen here.

Source:

The results show that jME3 is much faster than the other two, with a tpf almost twice as fast as Ardor3D (~45ms opposed to ~80 ms). There is a difference in tpf between Ardor3D and Java 3D of about ~20ms.

Note: Some might claim this an unfair comparison, since jME3 features a fully shader based architecture, which eliminates much of the purpose of this benchmark (sorting of GL calls). Nevertheless I dont think this should be held against jME3, but keep it in mind.

Graphs (images):
TimePerFrame16384_full_all
Memory16384_all
Spikes_all
Total Time
AvgFPS_partial_all
AvgFPS_all

The results can be downloaded here: StateSort.zip.

TransparencySort benchmark:
Consists of rotating transparent spheres with triangle counts between 58 and 110 tris. This is to test how well the APIs handle sorting of transparent objects. Screenshot can be seen here.

Source:

The results show that jME3 were much faster than the other two. With a difference between jME3 and Ardor3D of ~30ms. The tpf difference between Ardor3D and Java 3D is similar, however Java 3D has a much more unstable tpf, spiking much more.

Graphs (images):
TimePerFrame16384_full_all
Memory16384_all
Spikes_all
Total Time
AvgFPS_partial_all
AvgFPS_all

The results can be downloaded here: TransparencySort.zip.

tl;dr
Performance benchmarks testing various aspects of 3D scene graph APIs. The ones being tested are Java 3D, Ardor3D and jMonkeyEngine3. The results show that overall jME3 is the fastest API, followed by Ardor3D. The slowest is Java 3D. There is also a much more stable and consistent time per frame in jME3 and Ardor3D compared to Java 3D. Memory usage in jME3 and Ardor3D is much lower, and less garbage is created as opposed to Java 3D. Having mentioned this, it is worth noting that the results are different in some of the benchmarks. Picking results for Ardor3D were ommitted because it were simply too slow (20 minutes at 64 objects). Java 3D was the fastest doing picking. Stress testing add/removal operations to the scene graph showed that Ardor3D was the fastest, followed by jME3 and Java 3D. Dynamicly changing the vertices of objects also cause major headaches especially for jME3, and somewhat causes some problems for Ardor3D. Where spikes at fixed intervals occured, but they seem to be related to the garbage collection done, which takes place at the same frequency as the spikes in both APIs, where they lasted up to 20 seconds in jME3 and 0.14 seconds in Ardor3D.

I think the results are interesting. The performance difference between jME3 and the others were expected, however personally I would have thought Ardor3D would have performed better. Improvements can be made if the issues for Ardor3D in the picking benchmark are corrected, for both Ardor3D and jME3 in the dynamic geometry benchmark.

Due to time constraints I do not have time to implement this in other APIs, but if people are interested it could be done with relative ease. Most of the work lies in the base class, then it is trivial to implement each benchmark.

Last updated 20.04.2012

tensei

sproingie · April 3, 2012, 5:28pm

Awesome work! The memory graphs are really interesting: Java3D’s appears very “spiky”, and looks to be creating vast amounts of garbage. But since it’s perfectly sawtooth and not ramping up at all, I’d assume they’re all swept out in one cycle. Still a pain to have eden filling up with garbage from the graphics layer, means less room for the app.

It also appears that Ardor3d leaks memory when nodes are removed from the scenegraph.

Finally, I’d say the state sorting benchmark is perfectly “fair” to run with a shader-based engine. Dealing with multiple effects on different objects is an unavoidable fact of life in any complex scene, and it’s results that matter, not the algorithm used to achieve it.

tom · April 3, 2012, 6:04pm

No, it looks like it creates so little garbage that the gc is never called.

renanse · April 3, 2012, 6:09pm

The picking results are likely because we only keep up to 25 CollisionTrees in memory at a time by default. You have more than 25 spheres there, so it’s rebuilding trees over and over. You can use CollisionTreeManager.INSTANCE.setMaxElements(int) to bump that up to something relevant to your test.

sproingie · April 3, 2012, 6:33pm

Oho, indeed, I keep forgetting these are short-lived tests. I’m used to reading graphs from jconsole on the scale of gigs and hours, and if I see any ramp with no dropoffs, alarm bells go off

tensei · April 3, 2012, 9:54pm

I implemented the changes suggested by renanse, which fixed the speed for Ardor3D. Results and the first post have been updated. With the changes the speed order is: Java 3D, jME3, and Ardor3D. Old text regarding the issue have been marked with strikethrough, for those who havent read the post yet.

tberthel · April 4, 2012, 12:33am

Was “CollisionTreeManager.INSTANCE.setMaxElements(int)” the only change, or did he have other info.

gouessej · April 4, 2012, 12:42am

Which version of Java3D do you use? I assume you don’t use the one relying on JOGL 2.0.

renanse · April 4, 2012, 4:09am

What version of Ardor3D did you use btw? Your test harness does not compile with trunk.

tensei · April 4, 2012, 7:44am

Yep.

Im using 1.5.2.

I built Ardor3D back in february (07.02), using the latest sources available in SVN (Revision 1786). I tried recompiling with the newest binaries now, the only difference is that you have changed the constructor for LwjglCanvas to take the parameters in the opposite order. (unless you encountered any other problems? These were all I got).

gouessej · April 4, 2012, 10:12am

This version is no more maintained. You should use the latest one. My computers are in a bad state, I would have liked to use your benchmarks with that version.

tensei · April 4, 2012, 10:16am

If you port the Java 3D code over to the new version I would happily run it on my computer. Using Java 3D 1.5.2 was a conscious choice on my part, because I wanted to test the performance against that version.

renanse · April 5, 2012, 6:21am

Nope, that’s all I ran into, but we don’t alter the method signatures all that often so it wasn’t enough to tell me how old it was. Mostly I wondered because we’d done some bug fixing recently in trunk.

Momoko_Fan · April 15, 2012, 7:40pm

The reason for the 21 second spike in the DynamicGeometry benchmark is because its hitting GC on the leaked VertexBuffers (which are generated when you do clearBuffer/setBuffer on the Mesh). GC’ing VertexBuffers deletes them from OpenGL, so that spike was probably a million VBs getting deleted from the driver …

In any case, the latest version of jME3 has a limit placed upon number of GL objects deleted per frame at 100, which significantly reduces the spikes.
An additional fix can be implemented on the benchmark itself, simply by removing the “clearBuffer” statement from Jme3Sphere, that makes sure the buffer will be updated, rather than deleted and then recreated.
The final fix is here, in the Jme3Sphere class, which ensures that no VBs are created at runtime, and reduces the number of FBs created. In addition, it sets the “Stream” flag on the buffers which may help drivers organize memory better.


// In Jme3Sphere.java:

protected void updateBuffer(Mesh mesh, Type type, float[] newData) {
      VertexBuffer vb = mesh.getBuffer(type);
      FloatBuffer vbData;
      if (vb == null) {
          vbData = BufferUtils.createFloatBuffer(newData.length);
          vb = new VertexBuffer(type);
          vb.setupData(Usage.Stream, 3, Format.Float, vbData);
          mesh.setBuffer(vb);
      } else {
          vbData = (FloatBuffer) vb.getData();
          vbData = BufferUtils.ensureLargeEnough(vbData, newData.length);
          vb.updateData(vbData);
      }
      vbData.clear();
      vbData.put(newData);
      vbData.flip();
  }
  
    protected void setData(float[] vertices, float[] normals, boolean useNormals) {
        Mesh mesh;
        if (shape != null) {
            mesh = shape.getMesh();
        } else {
            mesh = new Mesh();
            mesh.setMode(Mesh.Mode.TriangleStrip);
            shape = new Geometry();
            shape.setMesh(mesh);
        }

        updateBuffer(mesh, Type.Position, vertices);
        if (useNormals) {
            updateBuffer(mesh, Type.Normal, normals);
        }

        shape.updateModelBound();
    }

tensei · April 16, 2012, 8:58am

I have implemented the changes in the benchmark for jME3, and it does indeed fix the long spike problems. When I get home I will rerun the benchmark for jME3, and update with new results. I will probably not have time to do it until tomorrow, so stay tuned

Could something similar be done in Ardor3D?

tensei · April 20, 2012, 3:29pm

I have updated the original post with the new results for dynamic geometry, after running it in all APIs for 32 to 8192 nodes. The benchmark was rerun for all APIs. I have implemented the suggested changes in jME3, and applied a similar approach to handling the buffers in Ardor3D as well. I think more can be done in Ardor3D?

The changes fixed the problems in jME3, and the results now show that it is the fastest of the three APIs.

It is also worth noting that I upgraded the graphics driver to version 301.24 beta.