Once again! fast MappedObjects implementation

lhkbob · July 5, 2011, 8:34pm

Oh for the love of … the things I cannot unsee!

princec · July 5, 2011, 9:13pm

Sadly the image is broken now

Maybe I could afford you, too

Cas

Riven · July 5, 2011, 10:13pm

Image back! Honestly, it’s the only reason I brought the server back up.

Riven · July 9, 2011, 7:16pm

Changes:

added support for .map(address, capacity)
added support for user-defined default constructor (as opposed to crashing!)


public class MappedVec3 extends MappedObject
{
     public float x, y, z;

     public MappedVec3()
     {
         this.x = this.y = 13.37f;
     }
}


      MappedVec3 vec3s = MappedVec3.map(address, capacity);

      if (vec3s.x != 0.0f)
         throw new IllegalStateException();
      vec3s.runViewConstructor();
      if (vec3s.x!= 13.37f)
         throw new IllegalStateException();

      vec3s.view = 1;
      if (vec3s.x != 0.0f)
         throw new IllegalStateException();

theagentd · July 10, 2011, 5:58am

Will definitively try this out when it’s released with LWJGL! =D
PS: Exactly when will it be added to LWJGL? =S
Edit 2: Couldn’t hold my breath any longer so I tried it out. Managed to get it working, but what the heck is up with the fork()-method?!

Riven · July 10, 2011, 11:49am

It relaunches the application, within the same JVM, with a new classloader that transforms all classes before they are loaded.

It is an alternative for the Java Instrumentation Agent, which I expected would be a bit too much to ask for most developers, and that would probably not work with JavaWebStart (whoever is still using that piece of tech).

Mike · July 10, 2011, 12:19pm

I’m really curious regarding speed improvements, is anyone busy with making a benchmark?

Mike

Riven · July 10, 2011, 12:31pm

Field access is typically 10% slower due to the JVM unrolling codeblocks fewer times (Spasi discovered that). The real win is that you don’t have to pump all data from your object graph to your buffer every frame. That might seem like taking away minor overhead, but it allows for major speed improvements.

You can do everything my lib does by rewriting all your code to be like buffer.get(…) and buffer.put(…, …), it isn’t anything magic under the hood.

theagentd · July 10, 2011, 1:58pm

I managed to figure that out by looking at the source code, but maybe you should mention that in the first post. I thought it was confusing but I might just be dumb…

Riven · July 10, 2011, 11:17pm

Changes:

added IllegalAccessError (when read-only fields are assigned) at transform-time (loading the class and transforming it), as opposed to runtime-time (when the faulty field access actually occurs)

Bugfix:

Solved verification error that occured if the callsite of the transformed bytecode contained code that threw an Exception.

I could not reproduce the verification error on my system, so thanks to Spasi for investigating this issue and providing a workaround.

theagentd · July 11, 2011, 11:03am

Nice! Another update!

Did you know that it fails on private/internal MappedObject classes?
Is there any way to disable the output? Kinda annoying when it floods the output window…

Also, I have a small “problem” with my particle engine test. MappedObjects sure eliminates the buffer operations that were actually bottlenecking the whole thing, but each particle also have data completely irrelevant to the rendering, and submitting 2-3x more data to the graphics card doesn’t seem like a very good optimization. Data I don’t want to send are things like the total lifetime, life left and current speed of the particle. It would be awesome if some of the data could be automatically stored outside of the buffer still but in memory for for each MappedObject “struct”. Obviously I can do this myself by keeping a separate array with the other data, but it feels like I’m defeating the purpose of it all. I’ll try that out this evening (I’m in Japan, so it’s 20:00 here xD).

gouessej · July 11, 2011, 12:54pm

Were my questions so irrelevant that they do not need any answers? Sorry.

princec · July 11, 2011, 1:09pm

theagentd:

Nice! Another update!

Did you know that it fails on private/internal MappedObject classes?

Is there any way to disable the output? Kinda annoying when it floods the output window…

Also, I have a small “problem” with my particle engine test. MappedObjects sure eliminates the buffer operations that were actually bottlenecking the whole thing, but each particle also have data completely irrelevant to the rendering, and submitting 2-3x more data to the graphics card doesn’t seem like a very good optimization. Data I don’t want to send are things like the total lifetime, life left and current speed of the particle. It would be awesome if some of the data could be automatically stored outside of the buffer still but in memory for for each MappedObject “struct”. Obviously I can do this myself by keeping a separate array with the other data, but it feels like I’m defeating the purpose of it all. I’ll try that out this evening (I’m in Japan, so it’s 20:00 here xD).

Your Particle class needs its own “non-rendering” data and a single instance of a MappedObject of some sort that is a window into the “rendering” data maybe. This is maybe not the most efficient way to do it though…

Cas

Roquen · July 11, 2011, 1:13pm

The best way to improve the performance of a scenegraph is to stop using one and change to spatial partitioning. Other than that, converting tree/graph like structures to be cache obvious is somewhat of a pain and for most people not worth the effort.

theagentd · July 11, 2011, 1:26pm

MY GOD. I DON’T BELIEVE IT. I rewrote my old particle engine test to eliminate some other boring bottlenecks, so it’s definitively not directly portable to a game anymore. It’s more of a benchmark for exactly what MappedObject is supposed to optimize. Guess what? It f*cking did. xD

With 250 000 particles:
Traditional puts: 106 FPS
MappedObject: 180 FPS

With 1 000 000 (THAT’S ONE MILLION DOTS):
Puts: 28 FPS
MappedObject: 51 FPS

NICE! 1.8x speedup! Ever heard of a laptop animating 1 million particles in 50 FPS using Java? xD In a real game you’d probably hit the fill ratio bottleneck of your GPU way before your CPU starts slowing things down at least.

I don’t even know what half of those words mean. ;D
I’ve never used OpenCL before. It just improves performance by completely eliminating puts and gets from a buffer used to send or recieve data from OpenGL or OpenCL. Extremely useful if you have a large amount of data being transferred. Most obvious applications include animation, CPU particle engines, terrain streaming and probably everything you do with OpenCL that requires communication with the CPU each run. And like Riven said before, it’s also more memory efficient.

@ Princec
I just used a MappedObject for the rendering data (position and color, totaling to 12 bytes per particle) and a separate Particle class to store the speed and state of the particle. Works wonders, as mentioned above.

EDIT: Ah, forgot. How the heck do I get rid of the debug output? Takes almost 20 second to start my test because of it. -.-

Riven · July 11, 2011, 1:49pm

@theagentd

I’m at work, have patience :point:

Spasi · July 11, 2011, 2:07pm

[quote=“theagentd,post:115,topic:31992”]
There are 2 booleans in the MappedObjectTransformer class (first two lines). Set both to false.

Riven · July 11, 2011, 2:28pm



{
     MappedObjectTransformer.register(Test.Xyz.class);
}

public class Test
{
   @MappedType(sizeof = 12)
   public static class Xyz extends MappedObject
   {
      int x, y, z;
   }
}

Works fine.

You just have to register it, so it must be public.

rexguo · July 11, 2011, 3:10pm

This utility is very cool.

Just wondering, is it possible to extend it, or to use a similiar
idea (transforming bytecode / objects stored in native memory)
to implement Structure of Arrays (vs classic OO Array of Structures)
in an elegant way? One of the things that benefit from SoA are
big particle systems where only part of the particle info needs
to be sent to the GPU.

.rex

ps: I’m hiring graphics and tools people:
http://www.linkedin.com/jobs?viewJob=&jobId=1754526
http://www.linkedin.com/jobs?viewJob=&jobId=1754523
http://www.linkedin.com/jobs?viewJob=&jobId=1653942
(I run the Engine team and we use Java + OpenGL)

Roquen · July 11, 2011, 5:06pm

IMHO: I’d suggest explicitly break-up the data rather than performing runtime weaving for SoA. I’ll let fans of DOP point you to those links.