Once again! fast MappedObjects implementation

Riven · July 13, 2011, 5:11pm


MappedFoo[] foos = new MappedFoo[1000];
for(int i=0; i<1000; i++)
{
   foos[i] = MappedFoo.malloc(3);
}

for(int i=0; i<63; i+=3)
{
   for(int j=0; j<63; j+=(i%3+1))
   {
       foos[i].view = i+j;
       foos[j].view = j*2;
       foos[j+i+j].view = 13+i;
       float abc = foos[j+i+j].x;
       foos[i].x = foos[j%2].y;
       foos[j].y = foos[999].z / abc;
   }
}

this example would require 5 localvars

R.D · July 13, 2011, 7:27pm

I tested this one, gives me the best result on JAVA benchmark (is this correct?)

And how comes that Server VM is better then Client VM? is this a common behavoir?

Spasi · July 13, 2011, 9:23pm

Yes, the java test should be the fastest, it’s simple array access. The server VM is able to perform much more advanced optimizations than the client VM, so the performance difference you’re seeing is expected. The only drawback is that server takes longer than client to apply the full range of JIT optimizations, so it’s trading quick start-up time for peak performance. JDK7 has support for tiered compilation (disabled by default currently), which is a balance between server and client.

As for the benchmark itself, it turns out my assumption about the memory write being the cause of the slowdown was wrong. Memory writes are cached as well, so only when a read is requested will the actual system memory update happen (or when the cache line is evicted). So, even though it’ll never be as fast as a CPU register, it’s not 3x slower. It turns out that the benchmark was flawed, in that it didn’t use the mapped object stride for the iteration, it used SIZEOF instead. Using a constant there results in a fully optimized loop, similar to the array one (I compared the JITed assembly with the debug JDK), whereas using stride is much more work for the CPU. Riven is currently examining a way to get rid of stride without sacrificing the functionality it provided.

Riven · July 13, 2011, 9:59pm

Upcoming version allows for a localvariable to be used as a view-controller, using the @MappedView annotation:


   public static void testLocalView(MappedFloat m, @MappedView int i)
   {
      for (i= 0; i< 5; i++)
      {
         m.value = (float) Math.random();
         m.value *= 2.0f;
      }
    
      for (i= 0; i< 5; i++)
      {
         System.out.println("[" + i + "] =>" + m.value);
      }
   }

As you can see, the view of the mapped object is directly controlled by modifying the variable ‘i’ (in this case).

The variable ‘i’ is actually not transformed, which means that ‘i++’ is as fast as usual. The new behaviour is implemented by transforming the field-access to a pointer calculated with the value ‘i’ instead of ‘mapped.view’.

The localView code is ~3.2x as fast, according to Spasi’s new benchmarks (a bit faster than float[] access).
The mapped.view code is ~1.5x as fast, according to Spasi’s new benchmarks (half the speed of float[] access).

Please keep in mind that I am still working on further optimisations and that Spasi’s benchmarks are naturally not representing expected performance for the general case (there is no general case).

v0.10 (which we will now deem slow-ish) was already a huge performance boost, in real-world situations.

Barely tested v0.12 (beware of crashes)

Thanks to Spasi for benchmarking and analyzing the generated bytecode for bottlenecks.

R.D · July 14, 2011, 9:19am

3.2 ? Awesome Still don’t get how this all works, I will ask some question in the future to finally understand it :A looks like something I will use for a bullet hell shooter test.

theagentd · July 14, 2011, 1:48pm

Riven:

Upcoming version allows for a localvariable to be used as a view-controller, using the @MappedView annotation:
   public static void testLocalView(MappedFloat m, @MappedView int i)
   {
      for (i= 0; i< 5; i++)
      {
         m.value = (float) Math.random();
         m.value *= 2.0f;
      }
    
      for (i= 0; i< 5; i++)
      {
         System.out.println("[" + i + "] =>" + m.value);
      }
   }
As you can see, the view of the mapped object is directly controlled by modifying the variable ‘i’ (in this case).

The variable ‘i’ is actually not transformed, which means that ‘i++’ is as fast as usual. The new behaviour is implemented by transforming the field-access to a pointer calculated with the value ‘i’ instead of ‘mapped.view’.

Does this mean that I can access the same MappedObject using local pointers from multiple threads without having to synchronize?

princec · July 14, 2011, 3:39pm

Hm, I’m sorry but this code is pretty unreadable. There’s no connect between what is happening and what is written. It’s going to lead to some really nasty obscure looking code.

Cas

Spasi · July 14, 2011, 4:10pm

[quote=“theagentd,post:146,topic:31992”]
Synchronization is still necessary if different threads access the same part of mapped memory. But it does mean that you don’t have to use .dup() to pass a different instance to each thread.

@princec: Normal .view access is still there and is much faster now, so you don’t have to use this method if you think it’s error prone. It can’t be used in every scenario anyway.

Riven · July 14, 2011, 5:01pm

Fair point. The performance gain however is so big that we just have to keep it in the library.

I think it’s fair to rename @MappedView to @DiehardMappedView (or something along those lines) to give a strong indication that average joe and even seasoned jane shouldn’t use it.

theagentd · July 15, 2011, 1:34pm

Aaaaw… I hate seasoning… Just wasted some money on some bread drenched in pepper. Japanese people are weird.
Uh, to be serious: I think performance is really important, but I totally agree with Princec. I definitely think you should have to link the i variable to the MappedObject you want to control…

Riven · July 15, 2011, 3:01pm

Trying to convert @MappedView to:


MappedVec3 vecs = Mapped3.malloc(5);
MappedVec3[] array = vecs.asArray();

for(int i=0; i<array.length; i++)
{
   array[i].x = 33.44f + i;
}

Library is currently in a major rewrite…

Riven · July 15, 2011, 10:16pm

I figured out the required bytecode transformations. :yawn:

Now I have to code something that finds the patterns in bytecode :emo:

gouessej · July 19, 2011, 12:13pm

Sorry my question was badly asked. Is it possible to explicitly destroy a MappedObject?

Spasi · July 19, 2011, 4:43pm

No, you just make it null and the GC takes care of it.

gouessej · July 20, 2011, 10:37am

Does it release the native resources too (for example a direct byte buffer used under the hood)?

Spasi · July 20, 2011, 11:38am

Yes. A MappedObject holds a reference to the mapped ByteBuffer. If there’s no other reference to that ByteBuffer when the MappedObject is freed, the buffer will be freed as well. Example:

ByteBuffer data = ByteBuffer.allocateDirect(100 * Vector4f.SIZEOF);
Vector4f vecsA = Vector4f.map(data);
Vector4f vecsB = vecA.dup(); // or .slice(); or another .map();

vecsA = null; // vecsA is now eligible for GC, but not data, vecsB still holds a reference to it.
vecsB = null; // both vecsB and data are eligible for GC.

It works exactly like the corresponding methods in NIO buffers. Only when all references are gone will the native memory be released. For details, see DirectByteBuffer’s source code, specifically the Deallocator inner class and how it’s used in the constructors.

Spasi · July 22, 2011, 2:11am

The next LWJGL nightly will have support for .asArray(). It will also require the latest ASM version (4.0-RC1), including the asm-analysis and asm-tree modules (you might as well use asm-all.jar).

Riven · July 22, 2011, 6:44am

If this is your way of notifying me of your takeover, well done. I have given you a library, I have given you both my idea of supporting arrays, the code and my progress on .asArray(), and without further notice, you declare you have implemented it yourself. I already explained I have limited time, but at least now I know that the time I spent on it this week was wasted and I can focus my efforts on other, dull projects.

Spasi · July 22, 2011, 9:28am

I was expecting a similar reaction and still decided against telling you anything. That should tell you something about how you talk to people. If you’d like to discuss this, feel free to PM me.

cylab · July 22, 2011, 12:07pm

Actually if it happens to be like it seems from an outside perspective, Rivens reaction is perfectly fine and your reply even is a bit insolent. What happend? You two seemed to get along pretty well…