A poor man's struct (err, MappedObject)

Riven · March 17, 2006, 2:52pm

I guess I have a reputation of microbenchmarks and struct-rambling over here…and here we go again

I tried to make an effort of making a struct-framework work for the time being, until Sun ships them in the JIT/VM in Java 7.0 (or later?).

It is basicly a source-code generator and a utility class to handle the ‘sliding window’ behaviour of the structs.

For all those who don’t have the patience to read the whole thing and want to see some performance-benchmarks…


Three datasets of 5000 3d-vectors:
     c = a +  t * ( b - a )

buffer = 15us   // FloatBuffer(3 * 5000), absolute put/get
struct = 24us   // my struct implementation
arrays = 40us   // float[3 * 5000]
object = 74us   // Vec3[1 * 5000]

gasp

First there is the StructGenerator class:

public static String createSourcecode(String className, Class[] types, String[] names)

Usage:
class Particle
{
   float x, y, z;
   int state;
}

Class[] primitives = new Class[]{float.class, float.class, float.class, int.class};
String[] names = new String[]{"x", "y", "z", "state"};
String code = StructGenerator.createSourcecode("Particle", primitives, names);

Now we have sourcecode we can compile. This is a one-time effort.

The generated class has a static method that gets us a StructBuffer for that specific class:

int particleCount = 1024;
StructBuffer buf = Particle.createStructBuffer(particleCount );

The StructBuffer holds the position of the ‘sliding window’. We will use the Particle like this:

Particle p = new Particle(buf);

buf.position(13);
p.x(0.4f);
p.y(0.5f);
p.z(0.6f);
p.state(-5);

buf.position(14);
p.x(0.3f);
p.y(0.2f);
p.z(0.1f);
p.state(42);

So the Particle isn’t sliding over the data, but the data is sliding underneath the Particle.

What if we want two Particles accessing the same dataset?


StructBuffer pBuf0 = buf;
StructBuffer pBuf1 = buf.duplicate();

Particle p0 = new Particle(pBuf0);
Particle p1 = new Particle(pBuf1);

pBuf0.position(11);
p0...;

pBuf1.position(9);
p1...;

//  p0.x = p1.y
p1.x( p0.y() );

Once we’re done manipulating the particle-data, we can extract the ByteBuffer from the StructBuffer like this:

ByteBuffer bb = buf.getBacking();

As said above, the performance is 3 times faster (!!) than iterating over an array of “struct-objects” (Vec3) and about 66% the speed of directly manipulating the FloatBuffers. This will only get better once Sun natively implements structs in the VM. It takes however the burden of massive gc() and object-creation.

I’m finalizing the sourcecode at the moment, but performance is kinda stuck at this level (which is nothing to be ashamed about IMO :))

Is this a usable design / framework, are there suggestions how to change things? I’d like to hear your comments.

zero · March 17, 2006, 5:21pm

hey there,

at first I want to apologize for commenting without being well informed about the whole structs discussion - though I read some postings and the RFE…

well, one thing what I really like to be cleared is the difference between Structs and MappedObjects :
As I said, I don’t have the definitions, which may be included in some other posts, in my head, but in order to minimize misunderstandings, followiing describes the way I will use both term here in this post:

Structs:
An automatic mechanism to copy a class from or to a buffer (and only a buffer). Classes are still reference types and can be null, in contrast to C# structs, which are value types. (-> no default constructor is needed for java structs)

MappedObjects:
An interpretation of a continous segment of a buffer, with shared memory. This means any modification to the Objects’ (marked) attributes or properties will reflect in a change of the buffer. Futher two or more Objects can be mapped to overlapping regions of the buffer.

OK, with this I can start commenting on your implementation:

Firstly, from my point of view you are talking about a mapping between a Buffer and Objects. The main difference to mapped objects, like described above, is that you usaully only have a few instances - as much as different structs have to be accessed at once. Since a non VM integrated type of an mapped object always has at least a single reference to a ByteBuffer, your technique should save a decent amount of memory. The other side of the coin is that IMHO the best argument for mapped objects is, to avoid those fency get and put method calls on a buffer. Your implementation however, only puts it to a slightly higher level: from primitves to objects (which are only allowed have primitive attributes?). I don’t like such a coding style, I find array of classes much more natural.

Btw:
As from my understading your current implementation needs two compilations, right?
If so, the following me be interesting to you:
With Java 6, you can dynamically compile a dynamically written class, load and instantiate it in the same runtime. (take a look a javax.tools)

Well, back to topic: I personally prefer the struct-way, because to my experience multiple mappings on the same data usually leads to undesirable side effects. Especially in multi-threaded applications this can be really horrible keeping safe.
What I really like about Java is that the language tries to avoid this. IMHO mapped objects are best compered to pointers, which were wisely not integrated into Java (in contast to C#). What’ll be the next thing, an extension which allows you to take own control of finalization? (like delete in C++)

The only argument against structs may be that a copy operation from or to a bufferis needed, but since structs are usually small Objects that shouldn’t have a strong effect on the performace. Therefore I currently would do an implentation like this:


public interface Struct {
	void load(ByteBuffer src);
	void save(ByteBuffer dst);	
}

public class Particle3 implements Struct {
	public float x;
	public float y;
	public float z;
	public int state;

	public void save(ByteBuffer src) {
		final FloatBuffer fb = src.asFloatBuffer;
		x = fb.get();
		y = fb.get();
		z = fb.get();
		state = fb.asIntBuffer().get();
	}

	public void save(ByteBuffer dst) {
		dst.asFloatBuffer.put(x).put(y).put(z).asIntBuffer().put(state);
	}
}

in future, I hope to change this to:
(we should be possible to do by ourself with Java 6 and annotations)


@Struct //--> this will automatically add the interface and generate an implementation/overide the old one
public class Particle3 {
	public float x;
	public float y;
	public float z;
	// how about: @Endian=little ? this would define how the data is saved/loaded from the ByteBuffer
	public int state;
}

finally, dealing with a buffer can be simplified to this:


public absract class StructBuffer extends Buffer {

	private ByteBuffer data;

	public StructBuffer get(Struct struct) {
		struct.load(data);
		return this;
	}
	public StructBuffer put(Struct struct) {
		struct.save(data);
		return this;
	}

	//..
}

Anyway, I’m very curious how the implementation in dolphin will look like. IMHO Java Team did a great job in the past, so I won’t expect less for the future

zero · March 17, 2006, 5:31pm

ah just one thing that seams a bit problematic with your implemantation:

maybe it is difficult to add methods the the generated Particle class, right?

Riven · March 17, 2006, 6:19pm

First of all, yes, it’s not a very clean and neat design, but we don’t have anything else as of yet.

Second, your FloatBuffer I/O is increadibly slow (relative put/get) and asIntBuffer() creates a new object.

Third, structs are not meant as copies, your StructBuffer impl. does a lot of copying, which is disasterous for performance.

Last (@zero) it’s not difficult at all to add new methods, it’s just plain generated source-code, you can do whatever you want with it.

The source-code looks like this:

public class Particle
{
   private static final int size = 16;



   public static final StructBuffer createStructBuffer(int elements)
   {
      ByteBuffer bb = ByteBuffer.allocateDirect(elements * size);
      bb.order(ByteOrder.nativeOrder());
      return new StructBuffer(Particle.class, bb, size);
   }

   private final StructBuffer buf;



   public Particle(StructBuffer buf)
   {
      // lots of stuff here
   }



   public final StructBuffer getStructBuffer()
   {
      return buf;
   }



   public final void x(float x)   {      access.putFloat((buf.offset + 0) + base, x);   }
   public final void y(float y)   {      access.putFloat((buf.offset + 4) + base, y);   }
   public final void z(float z)   {      access.putFloat((buf.offset + 8) + base, z);   }
   public final void state(int state)   {      access.putInt((buf.offset + 12) + base, state);   }

   public final float x()   {      return access.getFloat((buf.offset + 0) + base);   }
   public final float y()   {      return access.getFloat((buf.offset + 4) + base);   }
   public final float z()   {      return access.getFloat((buf.offset + 8) + base);   }
   public final int state()   {      return access.getInt((buf.offset + 12) + base);   }
}

princec · March 17, 2006, 6:39pm

One of the main ideas of structs/mapped objects/whatevers was to avoid memory copies. You are completely able to product the affect of structs by writing bytecode-rewriters and such but at the end of the day any savings you make in typing syntactic sugar are buggered by the VM being slow at doing the memory copy thing. Just 2p comment on the issue, not really to do with Riven’s implementation.

Cas

Riven · March 17, 2006, 6:43pm

Well, i’m glad i don’t do any memory copies

Besides that, yes, byte-code transformers (fields->methods in both struct-class and classes using it) are much much better, feel to write one? No? Me neither ;D

zero · March 18, 2006, 7:31am

Ahm, relative put/get are slower? Sorry, didn’t know that. Thought less arguments make it faster, but change the buffer’s position may have a greater impact…

Anyway, replace my code with and copying is as fast as your access to the struct components:


 public void load(ByteBuffer src) {
    final int pos = src.position();
    x = src.getFloat(pos);
    y = src.getFloat(pos+4);
    z = src.getFloat(pos+8);
    state = src.getInt(pos+12);
    src.position(pos+16);
}

public void save(ByteBuffer dst) {
    final int pos = src.position();
    dst.putFloat(pos, x);
    dst.putFloat(pos+4, y);Float
    dst.putFloat(pos+8, z);
    dst.putInt(pos+12, state);
}

Further I noticed a generic version of the StructBuffer may be needed for the marks and positions:


public absract class StructBuffer<S extends Struct> extends Buffer {

public <T extends Struct> StructBuffer<T> asStructBuffer(Class<T> structType) {
...
}

public <T extends Struct> StructBuffer<T> allocate(int capacity, Class<T> structType) {
...
}

}

You are kidding, don’t you?

One usaually does ONE copy per update/frame, e.g. for dynamic geometry sent to the graphics-card. Even with a heavy dynamic geometry load this will NEVER be a performance penalty or even the bottleneck of an application.
Its just a simple,fast copy, and if the data is dynamic it will be modified before the copy. Take care about the performance of these operations.
Java’s that great for that, making use them can drastically increase performance. I’ve opened another thread (Why mapped objects (a.k.a structs) aren’t the ultimate (performance) solution) for explaining a possible use, because I don’t want to hijack yours.

Btw. in C# + DirectX geometry is always copied when sent to a VertexBuffer since structs are always copied by only assigning them to a variabke (e.g. putting hem into an array) or passing them as an arguement (except using the ref keyword). nobody at microsoft complains about performonce. Further it is very convenient, since copy to a stream/buffer is as I described automatically enabled for structs like I advice.

Riven · March 18, 2006, 8:50am

My StructBuffer implementation is already generic.

Further, your load()/save() methods almost make me cry, ByteBuffer.getFloat() is even much slower than FloatBuffer.get() which is much slower than FloatBuffer.get(i) (which is much slower than Unsafe.getFloat(pointer) - which I use )

And if your benchmark tells you they have equal speed, your benchmark is flawed.

And about your copying, you’re copying a_lot_of_times_per_frame/update, basicly for every struct-access, not a ‘batch-write’ to the gpu.

It sounds to me you never worked with NIO and performance-critical-code, seeing the trival mistakes you make. (no offence intended)

zero · March 18, 2006, 9:14am

I have no benchmark, but what I can tell is that if I simply comment out the lines which fill the buffers send to the graphics-card every frame, the framerate doesn’t increase at all. So copying the data to the buffer has for me no influence on the performance (and I copy about half a million vertices (positions and normals only every frame).

You didn’t uderstand my version. in my code the ‘sructs’ are normal java classes. Accessing a field means accessing a field, no buffer at all. Once all modificartions is done, I either:

copy the data to a buffer, I have allocated once inside the VM and send it to graphics-card with glBufferSubData
or directly use the the memory from openGL (glMapBuffer) and copy data into the returned ByteBuffer

–> one copy per frame

Riven · March 18, 2006, 9:33am

Sounds like your geometry-related-code is not your bottleneck at all. So why bother?

object->fields->buffer->gpu is very inefficient, yet might be ‘good enough’ for you. Don’t project that to the general case.

[quote]You didn’t uderstand my version. in my code the ‘sructs’ are normal java classes.
[/quote]
That sounds pretty much like: you don’t understand my Ships, they are Cars.

I understand everybody not dealing with raw performance on NIO-data disgusts Structs… that’s good, they are not meant for these people.

Anybody any comment on the actual API, instead of the chit-chat about the necessity of structs? Thanks!

zero · March 18, 2006, 9:41am

Right, I don’t have any problems with performance, so I don;t bother about structs

No they aren’t that inefficent. this is a copy operations, constant in time. Linear for an array. My whole point is that this will (almost) never be the bottleneck of a REAL application. And if so Java may the wrong language for you, sorry.
Invent a new one where all data is stored in some kind of buffer and Objects only mapped to them, this would be great for you, right? (also no offense intended :))

Riven · March 18, 2006, 9:48am

Then don’t hijack this thread talking about things that you don’t want, don’t need, and don’t understand.

I was interested in comments on the API, not about “do we need structs?” - yes, some of us need them, most of us dont.

zero · March 18, 2006, 10:01am

hey, don’t blame me, that’s why I opened the other thread. if you like, please comment there whether you think the kind of optimization I discussed in the 2nd half is possible with your struct implementation. that would be contructive

Further, I comented on your API, that I don’t like the style encapsulating a single struct into a buffer class, don’t remember that?

Btw you told that ByteBuffer.putFloat is slower than FloatBuffer.put, but you used this (access.putFloat,…). did I miss s.th. ?

Riven · March 18, 2006, 10:11am

Ofcourse you don’t like the API/design when you don’t need it, because it’s kinda “hackery” and non-Java, but required when performance is all one cares about.

And no, i’m not encapsulating a single struct into a buffer-class (check the two-structs-example). And the StructBuffer class is not bound to the Particle-class, but can be used for any generated struct.

About access.putFloat() - My own quote:

[quote]ByteBuffer.getFloat() is even much slower than FloatBuffer.get() which is much slower than FloatBuffer.get(i) (which is much slower than Unsafe.getFloat(pointer) - which I use)
[/quote]
So I’m using the Unsafe-class (in a 100% safe way)

zero · March 18, 2006, 10:18am

sun.misc.Unsafe, which only ships with SUN’ Java, right?

mabraham · March 19, 2006, 6:55pm

Look what point exactly are you trying to make here? Sure using Unsafe is hackery, but didn’t Riven admit to that right from the start? I’ve been following this discussion, found it refreshing for a while but am getting a little concerned now, cos I can’t help feeling you missed the entire point! You consider structs as value types that maintain their own state. Riven’s structs merely wrap existing data. You provide convenience methods to read from and write to a buffer. Riven’s structs provide a (structured, fast-path) view on bulk data. Why do you bluntly suggest using a different language, when someone has just shown a way of getting really good performance out of Java, on a silver plate. This may not be your cup of tea, and to be fair, I wouldn’t use this because of the Unsafe bit. But still, I’m fascinated!

I hope I haven’t missed anything but anyway, that’s my take of this thread so far.

Cheers,
Matt.

princec · March 19, 2006, 7:10pm

Fecking structs, I wish I’d never called them that. The whole point is nothing to do with C or structs in C or “value types”. It is all about providing object-oriented access to data held in ByteBuffers in an efficient manner which is easy to implement in the current VMs, without breaking any semantics in the language or the specs.

Cas

Riven · March 20, 2006, 8:19am

[quote=“mabraham,post:16,topic:26592”]
I can take out the unsafe-bit quite easily, reducing performance by factor 2 (when using fancy {…}Buffers) or factor 4-8 when using ByteBuffers directly

Well I know they are not structs, but it’s the closest thing to it, in Java, right? When I’d call them “sliding window objects” nobody would have a clue.

Anyway, when I code the bytecode transformer on a rainy afternoon, you’ll be the first to know.

tusaki · March 20, 2006, 8:38am

It’s raining where I am right now! and its almost afternoon ;D

Riven · March 20, 2006, 6:24pm

I’m halfway with a new design and bytecode-transformer (and they say the last 20% takes 80% of the time, arg!)

This is the input:

@MappedObject(sizeof = 12)
public class Vector
{
   @MappedField(offset = 0)
   public float x;

   @MappedField(offset = 4)
   public float y;

   @MappedField(offset = 8)
   public float z;



   public final float length()
   {
      return (float) Math.sqrt(x * x + y * y + z * z);
   }
}

This is the output: (note the class extends Vector, so all methods will still be available)

public final class MappedVector extends Vector
{
   public static final int sizeof = 12;



   public MappedVector()
   {
      this.buf = null;
   }



   public MappedVector(MappedObjectBuffer<MappedVector> buf)
   {
      if (buf.type != MappedVector.class)
         throw new IllegalArgumentException("MappedByteBuffer type must be: MappedVector");

      this.buf = buf;
   }

   private final MappedObjectBuffer<MappedVector> buf;



   // MappedObject structure:
   // -> mapped field "x" at offset #0
   // -> mapped field "y" at offset #4
   // -> mapped field "z" at offset #8

   // setters
   public final void x(float x)   {      buf.putFloat(0L, x);   }
   public final void y(float y)   {      buf.putFloat(4L, y);   }
   public final void z(float z)   {      buf.putFloat(8L, z);   }

   // getters
   public final float x()   {      return buf.getFloat(0L);   }
   public final float y()   {      return buf.getFloat(4L);   }
   public final float z()   {      return buf.getFloat(8L);   }
}

Usage: (without bytecode transformations)

MappedObjectBuffer<MappedVector> aBuf = MappedObjectBufferFactory.create(new MappedVector(), elements);



      MappedVector aVec = aBuf.getMappedObject();
      MappedVector bVec = bBuf.getMappedObject();
      MappedVector cVec = cBuf.getMappedObject();

      aBuf.rewind();
      bBuf.rewind();
      cBuf.rewind();

      while (a.hasRemaining())
      {
         cVec.x(aVec.x() + t * (bVec.x() - aVec.x()));
         cVec.y(aVec.y() + t * (bVec.y() - aVec.y()));
         cVec.z(aVec.z() + t * (bVec.z() - aVec.z()));

         aBuf.next(); // equals: aBuf.position(aBuf.position() + 1);
         bBuf.next();
         cBuf.next();
      }

Usage: (with bytecode transformations)

MappedObjectBuffer<MappedVector> aBuf = MappedObjectBufferFactory.create(new MappedVector(), elements);



      MappedVector aVec = a.getMappedObject();
      MappedVector bVec = b.getMappedObject();
      MappedVector cVec = c.getMappedObject();

      aBuf.rewind();
      bBuf.rewind();
      cBuf.rewind();

      while (a.hasRemaining())
      {
         cVec.x = aVec.x + t * (bVec.x - aVec.x); // the compiler won't complain, because (it thinks) we extend Vector and access its fields
         cVec.y = aVec.y + t * (bVec.y - aVec.y);
         cVec.z = aVec.z + t * (bVec.z - aVec.z);

         aBuf.next();
         bBuf.next();
         cBuf.next();
      }

As MappedVector is a subclass of Vector, you can simply do this:

Vector vec = a.getMappedObject();

With the bytecode stuff done you can access the fields, which will be converted into method-calls at runtime.

And with the bytecode classes done, it will be trival to remove the source-code generation / compilation process:


MappedObjectBuffer mob = MappedObjectBuffer.create(new Vector(), 10000);
Vector vec = mob.getMappedObject();

mob.position(13);
vec.x = 4;
vec.y = 5;
vec.z = 6;
vec.length();

mob.position(14);
...