A poor man's struct (err, MappedObject)

I use term dipper for simillar structures. That things you talked about might be called memory mapped, or siliding window dipper.

How are you going to implement write-through and read-through for field accesses directly to the underlying data though?

Cas :slight_smile:

By replacing all bytecode-level calls to the fields with bytecode-level calls to the methods.

So every class that is loaded, has to be checked for those fields, which is possible with java.lang.instrument.ClassFileTransformer.

So adding the transformer has to be one of the first things the app does after launch.

Riven, I hate to stop you getting that bytecode-transformer finished :wink:
but I still have little insight in which buffer methods are fast and which aren’t. There once was a thread on that (http://www.java-gaming.org/forums/index.php?topic=11385.0 ) but it ended that at least for mustang, all buffer methods should perform excellent.

So I learned that absolute puts and gets are faster than relative puts and gets, aren’t they?
Does it make a difference between e.g. ByteBuffer.putFloat()… and converting the ByteBuffer to a FloatBuffer and then calling put?
What else are the take away points to make direct buffer access fast?

Thanks a lot!
Stefan

Performance factors (my own experience, YMMV)

[tr][td]Virtual machine [/td][td]Sun 1.5.0_06 “server”[/td][/tr]
[tr][td]Platform[/td][td]WinXP, P4 2.4GHz / 533MHz FSB, 512MB PC2700[/td][/tr]

[tr][td]Unsafe.putFloat(long, float)[/td][td]125%[/td][/tr]
[tr][td]FloatBuffer.put(int, float) (standard)[/td][td]100%[/td][/tr]
[tr][td]FloatBuffer.put(float)[/td][td]50%[/td][/tr]
[tr][td]ByteBuffer.putFloat(int, float)[/td][td]15%[/td][/tr]
[tr][td]ByteBuffer.putFloat(float)[/td][td]14%[/td][/tr]

Only take this as a reference, and run your own benchmarks please

how to put/get floats/ints/etc using the unsafe-class?

i’ve looked at sun’s directbytebuffer. it seems i can allocate, access and free memory using Unsafe pretty simply. the only reason not using a bytebuffer when going for performance seems to be that a range check is performed at every access. however, i found a lot of “anInt << 0”-bitshifts. my brain, and my ide also say that these are completely pointless. what are they good for?

[some time later]

i wrote a small benchmark, but it lies. the java float array is, depening on some fine tuning, up to 30 times faster than accessing the data via Unsafe. i guess the vm is understanding my benchmark and starts to cheat.

The way one can access the pointers is considered so “not done”, that I hestitate a little to just post it in this thread for everybody to use and abuse.

Use the forum-search and who-knows what you will find :wink:

here’s my result:

it seems that arrays are a LOT faster than any buffer available.
running on 1.5.0_06 -server
wtf?
looking at this, i have to ask: what are buffers good for? why is the unsafe access that slow? shouldn’t it be faster?

import sun.misc.Unsafe;

import java.lang.reflect.Field;
import java.nio.FloatBuffer;
import java.nio.ByteBuffer;


public class UnsafeDemo
{
   private static final int SIZE = 10000000;
   private static final int LOOPS = 50;


   public static void main(String[] args) throws Exception
   {
      Unsafe unsafe = getUnsafe();
      System.out.println("Unsafe = " + unsafe);
      System.out.println("  addressSize() = " + unsafe.addressSize());
      System.out.println("  pageSize() = " + unsafe.pageSize());
      System.out.println("  pageSize() = " + unsafe.pageSize());
      int cap = SIZE;//10m
      int ps = unsafe.pageSize();
      long base = unsafe.allocateMemory(cap + ps);
      long pointer;
      unsafe.setMemory(base, cap + ps, (byte) 0);
      if (base % ps != 0)
      {
         // Round up to page boundary
         pointer = base + ps - (base & (ps - 1));
      }
      else
      {
         pointer = base;
      }
      for (int i = 0; i < 5; i++)
      {
         benchArray();
         benchUnsafe(unsafe, pointer);
         benchBuffer();
         benchDirectBuffer();
         System.out.println("-------");
      }
      unsafe.freeMemory(pointer);
   }

   private static void benchDirectBuffer()
   {
      ByteBuffer fb = ByteBuffer.allocateDirect(SIZE);
      //unsafe benchmark
      int INTS = SIZE / 4;
      float[] dummy = new float[INTS];
      for (int i = 0; i < INTS; i++)
      {
         fb.putFloat(i,(float) (Math.random() * 1000));
      }
      long time = System.nanoTime();
      float x = 0;
      for (int i = 0; i < INTS; i++)
      {
         x += fb.getFloat(i);
      }
      System.out.println("Directbytebuffer2float time: " + (System.nanoTime() - time) / 1000000.0d + " ms");
      System.out.println("x array= " + x);
   }

   private static void benchUnsafe(final Unsafe unsafe, long pointer)
   {
      //unsafe benchmark
      int INTS = SIZE / 4;
      long absAdr = pointer-4;
      for (int i = 0; i < INTS; i++)
      {
         unsafe.putFloat(absAdr+=4, (float) (Math.random() * 1000));
      }
      long time = System.nanoTime();
      int x = 0;
      absAdr = pointer-4;
      for (int i = 0; i < INTS; i++)
      {
         x += unsafe.getFloat(absAdr+=4);
      }
      System.out.println("unsafe time " + (System.nanoTime() - time) / 1000000.0d + " ms");
      System.out.println("x unsafe= " + x);
   }

   private static void benchArray()
   {
      //unsafe benchmark
      int INTS = SIZE / 4;
      float[] dummy = new float[INTS];
      for (int i = 0; i < INTS; i++)
      {
         dummy[i] = (float) (Math.random() * 1000);
      }
      long time = System.nanoTime();
      float x = 0;
      for (int i = 0; i < INTS; i++)
      {
         x += dummy[i];
      }
      System.out.println("Array time: " + (System.nanoTime() - time) / 1000000.0d + " ms");
      System.out.println("x array= " + x);
   }

   private static void benchBuffer()
   {
      FloatBuffer fb = FloatBuffer.allocate(SIZE/4);
      //unsafe benchmark
      int INTS = SIZE / 4;
      float[] dummy = new float[INTS];
      for (int i = 0; i < INTS; i++)
      {
         fb.put(i,(float) (Math.random() * 1000));
      }
      long time = System.nanoTime();
      float x = 0;
      for (int i = 0; i < INTS; i++)
      {
         x += fb.get(i);
      }
      System.out.println("Floatbuffer time: " + (System.nanoTime() - time) / 1000000.0d + " ms");
      System.out.println("x array= " + x);
   }


   public static Unsafe getUnsafe() throws Exception
   {
      Unsafe unsafe = null;

      try
      {
         Class uc = Unsafe.class;
         Field[] fields = uc.getDeclaredFields();
         for (int i = 0; i < fields.length; i++)
         {
            if (fields[i].getName().equals("theUnsafe"))
            {
               fields[i].setAccessible(true);
               unsafe = (Unsafe) fields[i].get(uc);
               break;
            }
         }
      }
      catch (Exception ignore)
      {
      }

      return unsafe;
   }
}

for completition, the client result:

[quote]Array time: 24.679127 ms
unsafe time 40.004324 ms
Floatbuffer time: 23.507286 ms
Directbytebuffer2float time: 59.796623 ms
[/quote]
what does that tell us?
array access is about 10 times faster when using the server vm. rofl?

jrockit:

[quote]Array time: 5.091692999999999 ms
unsafe time 29.136017 ms
Floatbuffer time: 5.340409 ms
Directbytebuffer2float time: 11.336103 ms
[/quote]
giving up

You have a SIZE of 10M (10MB !!) which does an awful lot of cache-flushing.

On Unsafe code:

for (int i = 0; i < INTS; i++)
      {
         x += unsafe.getFloat(absAdr+=4);
      }

You’re doing 2 increments per loop-iteration here. Bit unfair!

Change your benchmark to a real-world scenario and you’ll get very different results.

Arg!

FloatBuffer fb = FloatBuffer.allocate(SIZE/4);
uses a float[] as backing

Please do:
ByteBuffer bb = ByteBuffer.allocateDirect(n * 4);
bb.order(ByteOrder.nativeOrder());
FlaotBuffer fb = bb.asFloatBuffer();

that backed-by-float[]-floatbuffer is STILL faster than using Unsafe.

i removed the double increment. saved about 2ms (of 29). i’m at 27 now. and i’m just reading the same element 10 million times now, so there shouldn’t be cache flushing, right?.
the 10mb size shouldn’t matter at all. every test iterates over every element.

on my machine, your direct float buffer is slower (a few ms) than the backed one…

Let’s not turn this thread into a benchmark frenzy, please.

The generated machine-code depends greatly on the JIT and can’t really be predicted.

Sometimes FloatBuffers perform like float-arrays, sometimes they perform 10 times worse… let me show you what I mean:

Testing all 4 types of access:

float[]:               183ms
unsafe:                127ms
floatbuffer (direct):  1131ms
floatbuffer (float[]): 1035ms

Testing 3 types of access (all minus floatbuffer (float[]) )

float[]:               194ms
unsafe:                128ms
floatbuffer (direct):  147ms <--------- 7.7x faster
floatbuffer (float[]): 0ms

So code in other methods doing other things, affects performance in appearantly unrelated code!

When invoking this anywhere:
FloatBuffer.allocate(elements);
the code using DirectFloatBuffer drops performance by factor 7-8 :o

Probably because the JIT notices FloatBuffer now has more than 1 active subclass, so it can’t optimize the general FloatBuffer class as it would do when only having DirectFloatBuffers around.


benchmark();
printResults(); // floatbuffer (direct):  115ms

FloatBuffer.allocate(elements);

benchmark();
printResults(); // floatbuffer (direct):  971ms

* Riven learned something new today

[quote=“Riven,post:32,topic:26592”]
This situation is supposed to be better in Mustang with bimorphic call inlining.

Life is better after 6 ;D

Here are my results for both benchmarks. (including suggested improvements)

Benchy


before FloatBuffer.allocate()
   float[]:               211ms
   unsafe:                218ms
   floatbuffer (direct):  304ms

after FloatBuffer.allocate()
   float[]:               249ms
   unsafe:                219ms
   floatbuffer (direct):  1150ms

UnsafeDemo:


Array time: 30.98215 ms
x array= 1.24908006E9
unsafe time 52.641886 ms
x unsafe= 1249216512
Floatbuffer time: 31.53641 ms
x array= 1.24949581E9
Directbytebuffer2float time: 101.041054 ms
x array= 1.14874189E9

the reason for unsafedemo being slower is that i use a “naked” unsafe instance. benchy extracts it from a bytebuffer.
if i use a bytebuffer’s Unsafe, unsafe.get/put & arrays are almost equally fast. Unsafe is a bit faster, but you could have trouble avoiding the “base + (index<<bytes_of_type)”-calculation when accessing the buffer. if you can avoid it, Unsafe is the way to go :slight_smile:

My implementation avoids it… :wink:

Does the JVM do bounds-check removal on Buffer types? I know i does on arrays…

As it is faster than float[], I’d be very surprised if it did not remove bound-checks.

Only the server VM does bounds check hoisting, AFAIK.

Cas :slight_smile: