Once again! fast MappedObjects implementation


MappedFoo[] foos = new MappedFoo[1000];
for(int i=0; i<1000; i++)
{
   foos[i] = MappedFoo.malloc(3);
}

for(int i=0; i<63; i+=3)
{
   for(int j=0; j<63; j+=(i%3+1))
   {
       foos[i].view = i+j;
       foos[j].view = j*2;
       foos[j+i+j].view = 13+i;
       float abc = foos[j+i+j].x;
       foos[i].x = foos[j%2].y;
       foos[j].y = foos[999].z / abc;
   }
}

this example would require 5 localvars

I tested this one, gives me the best result on JAVA benchmark (is this correct?)

And how comes that Server VM is better then Client VM? is this a common behavoir?

Yes, the java test should be the fastest, it’s simple array access. The server VM is able to perform much more advanced optimizations than the client VM, so the performance difference you’re seeing is expected. The only drawback is that server takes longer than client to apply the full range of JIT optimizations, so it’s trading quick start-up time for peak performance. JDK7 has support for tiered compilation (disabled by default currently), which is a balance between server and client.

As for the benchmark itself, it turns out my assumption about the memory write being the cause of the slowdown was wrong. Memory writes are cached as well, so only when a read is requested will the actual system memory update happen (or when the cache line is evicted). So, even though it’ll never be as fast as a CPU register, it’s not 3x slower. It turns out that the benchmark was flawed, in that it didn’t use the mapped object stride for the iteration, it used SIZEOF instead. Using a constant there results in a fully optimized loop, similar to the array one (I compared the JITed assembly with the debug JDK), whereas using stride is much more work for the CPU. Riven is currently examining a way to get rid of stride without sacrificing the functionality it provided.

Upcoming version allows for a localvariable to be used as a view-controller, using the @MappedView annotation:


   public static void testLocalView(MappedFloat m, @MappedView int i)
   {
      for (i= 0; i< 5; i++)
      {
         m.value = (float) Math.random();
         m.value *= 2.0f;
      }
    
      for (i= 0; i< 5; i++)
      {
         System.out.println("[" + i + "] =>" + m.value);
      }
   }

As you can see, the view of the mapped object is directly controlled by modifying the variable ‘i’ (in this case).

The variable ‘i’ is actually not transformed, which means that ‘i++’ is as fast as usual. The new behaviour is implemented by transforming the field-access to a pointer calculated with the value ‘i’ instead of ‘mapped.view’.

The localView code is ~3.2x as fast, according to Spasi’s new benchmarks (a bit faster than float[] access).
The mapped.view code is ~1.5x as fast, according to Spasi’s new benchmarks (half the speed of float[] access).

Please keep in mind that I am still working on further optimisations and that Spasi’s benchmarks are naturally not representing expected performance for the general case (there is no general case).

v0.10 (which we will now deem slow-ish) was already a huge performance boost, in real-world situations.

Barely tested v0.12 (beware of crashes)

Thanks to Spasi for benchmarking and analyzing the generated bytecode for bottlenecks.

3.2 ? Awesome :smiley: Still don’t get how this all works, I will ask some question in the future to finally understand it :A looks like something I will use for a bullet hell shooter test.

Does this mean that I can access the same MappedObject using local pointers from multiple threads without having to synchronize?

Hm, I’m sorry but this code is pretty unreadable. There’s no connect between what is happening and what is written. It’s going to lead to some really nasty obscure looking code.

Cas :slight_smile:

[quote=“theagentd,post:146,topic:31992”]
Synchronization is still necessary if different threads access the same part of mapped memory. But it does mean that you don’t have to use .dup() to pass a different instance to each thread.

@princec: Normal .view access is still there and is much faster now, so you don’t have to use this method if you think it’s error prone. It can’t be used in every scenario anyway.

Fair point. The performance gain however is so big that we just have to keep it in the library.

I think it’s fair to rename @MappedView to @DiehardMappedView (or something along those lines) to give a strong indication that average joe and even seasoned jane shouldn’t use it.

Aaaaw… I hate seasoning… Just wasted some money on some bread drenched in pepper. Japanese people are weird.
Uh, to be serious: I think performance is really important, but I totally agree with Princec. I definitely think you should have to link the i variable to the MappedObject you want to control…

Trying to convert @MappedView to:


MappedVec3 vecs = Mapped3.malloc(5);
MappedVec3[] array = vecs.asArray();

for(int i=0; i<array.length; i++)
{
   array[i].x = 33.44f + i;
}

Library is currently in a major rewrite…

I figured out the required bytecode transformations. :yawn:

Now I have to code something that finds the patterns in bytecode :emo:

Sorry my question was badly asked. Is it possible to explicitly destroy a MappedObject?

No, you just make it null and the GC takes care of it.

Does it release the native resources too (for example a direct byte buffer used under the hood)?

Yes. A MappedObject holds a reference to the mapped ByteBuffer. If there’s no other reference to that ByteBuffer when the MappedObject is freed, the buffer will be freed as well. Example:

ByteBuffer data = ByteBuffer.allocateDirect(100 * Vector4f.SIZEOF);
Vector4f vecsA = Vector4f.map(data);
Vector4f vecsB = vecA.dup(); // or .slice(); or another .map();

vecsA = null; // vecsA is now eligible for GC, but not data, vecsB still holds a reference to it.
vecsB = null; // both vecsB and data are eligible for GC.

It works exactly like the corresponding methods in NIO buffers. Only when all references are gone will the native memory be released. For details, see DirectByteBuffer’s source code, specifically the Deallocator inner class and how it’s used in the constructors.

The next LWJGL nightly will have support for .asArray(). It will also require the latest ASM version (4.0-RC1), including the asm-analysis and asm-tree modules (you might as well use asm-all.jar).

If this is your way of notifying me of your takeover, well done. I have given you a library, I have given you both my idea of supporting arrays, the code and my progress on .asArray(), and without further notice, you declare you have implemented it yourself. I already explained I have limited time, but at least now I know that the time I spent on it this week was wasted and I can focus my efforts on other, dull projects.

I was expecting a similar reaction and still decided against telling you anything. That should tell you something about how you talk to people. If you’d like to discuss this, feel free to PM me.

Actually if it happens to be like it seems from an outside perspective, Rivens reaction is perfectly fine and your reply even is a bit insolent. What happend? You two seemed to get along pretty well…