Bound's checks and struct's

abies · October 21, 2005, 2:52pm

PMs used to pop up here on login, don’t they ? Sorry - I have not noticed a new one, information is hidden on bottom of main page…

As for printing out native code, download debug 6.0 jvm and call it with
-XX:+PrintOptoAssembly
on command line.

For your current code, I’m getting factor of 1.1 on first few iterations, then some compilations kicks in and ration changes to 2.2 and stays there.
Try to access more than one object in same method (3-4 of them) - you will get a lot worse ratio.

Problem is, that there seems to be certain kind of operations in very simple cases which get totally optimized by hotspot (ratio of 1.1-1.3). But anything more complicated and we are back into lets-call-a-method mode, which gives ratio 10+.

Riven · October 21, 2005, 3:00pm

I’ll make a real-world example:

An array of 3d vectors (float) multiplied by a 4x4 matrix…

abies · October 21, 2005, 4:07pm

Some piece of code which computes centers of triangles in big array.

Results are 4.6s for Buffer, 2.8s for Unsafe based.

import java.lang.reflect.Field;
import java.nio.Buffer;
import java.nio.ByteBuffer;
import java.nio.ByteOrder;
import java.nio.FloatBuffer;

import sun.misc.Unsafe;

class BVertex {

  private final FloatBuffer fb;
  private int off;

  public BVertex(FloatBuffer fb) {
    this.fb = fb;
  }

  public void position(int vIndex) {
    off = vIndex*3;
  }

  public final float x() { return fb.get(off); }
  public final float y() { return fb.get(off + 1); }
  public final float z() { return fb.get(off + 2); }

  public final void x(float x) { fb.put(off, x); }
  public final void y(float y) { fb.put(off + 1, y); }
  public final void z(float z) { fb.put(off + 2, z);}

}

class UVertex {
  static Unsafe unsafe;
  static Field addressHack;
  static {
    try {
      ByteBuffer bb = ByteBuffer.allocateDirect(1);
      Field unsafeHack = bb.getClass().getDeclaredField("unsafe");
      unsafeHack.setAccessible(true);
      unsafe = (Unsafe) unsafeHack.get(bb);

      addressHack = Buffer.class.getDeclaredField("address");
      addressHack.setAccessible(true);
    } catch (Exception exc) {
      exc.printStackTrace();
    }
  }

  private long base;
  private int offset;


  public UVertex(FloatBuffer fb) {
    try {
      base = addressHack.getLong(fb);
    } catch (Exception exc) {
      exc.printStackTrace();
      throw new InternalError();
    }
  }

  public void position(int vIndex) {
    offset = vIndex*12;
  }

  public final float x() { return unsafe.getFloat(base+offset); }
  public final float y() { return unsafe.getFloat(base+offset + 4); }
  public final float z() { return unsafe.getFloat(base+offset + 8); }

  public final void x(float x) { unsafe.putFloat(base+offset, x); }
  public final void y(float y) { unsafe.putFloat(base+offset+4, y); }
  public final void z(float z) { unsafe.putFloat(base+offset+8, z); }

}

public class VertexTest {

  static final int TRIANGLES_COUNT = 10000;
  static final FloatBuffer triangles = ByteBuffer.allocateDirect(
      TRIANGLES_COUNT * 3 * 3 * 4).order(ByteOrder.nativeOrder())
      .asFloatBuffer();
  static final FloatBuffer center = ByteBuffer.allocateDirect(
      TRIANGLES_COUNT * 3 * 4).order(ByteOrder.nativeOrder()).asFloatBuffer();

  public static void main(String[] argv) {

    for (int i = 0; i < TRIANGLES_COUNT; i++) {
      triangles.put(i, i);
    }

    for (int i = 0; i < 10; i++) {
      mainX();
    }
  }

  private static void mainX() {
    long start = System.currentTimeMillis();
    for (int i = 0; i < 10000; i++) {
      computeBufferCenters();
    }
    System.out.println("Buffer-based " + (System.currentTimeMillis() - start)
        + "ms");

    start = System.currentTimeMillis();
    for (int i = 0; i < 10000; i++) {
      computeUnsafeCenters();
    }
    System.out.println("Unsafe-based " + (System.currentTimeMillis() - start)
        + "ms");
  }

  private static void computeUnsafeCenters() {
    UVertex a = new UVertex(triangles);
    UVertex b = new UVertex(triangles);
    UVertex c = new UVertex(triangles);
    UVertex d = new UVertex(center);

    for (int i = 0; i < TRIANGLES_COUNT; i++) {
      int tstart = i * 3;
      a.position(tstart);
      b.position(tstart + 1);
      c.position(tstart + 2);

      d.position(i);

      d.x((a.x() + b.x() + c.x()) / 3);
      d.y((a.y() + b.y() + c.y()) / 3);
      d.z((a.z() + b.z() + c.z()) / 3);
    }
  }

  private static void computeBufferCenters() {
    BVertex a = new BVertex(triangles);
    BVertex b = new BVertex(triangles);
    BVertex c = new BVertex(triangles);
    BVertex d = new BVertex(center);

    for (int i = 0; i < TRIANGLES_COUNT; i++) {
      int tstart = i * 3;
      a.position(tstart);
      b.position(tstart + 1);
      c.position(tstart + 2);

      d.position(i);

      d.x((a.x() + b.x() + c.x()) / 3);
      d.y((a.y() + b.y() + c.y()) / 3);
      d.z((a.z() + b.z() + c.z()) / 3);
    }
  }
}

Riven · October 21, 2005, 4:22pm

Woah, you’re making a direct-buffer of 1 byte, then accessing N bytes from it… ;D

Not very neat huh? Meanwhile I’m working on the next version…

abies · October 21, 2005, 4:28pm

ByteBuffer bb = ByteBuffer.allocateDirect(1);
this is local variable, used only to get a field reference for reflection. I could probably just use ByteBuffer.class instead of bb.getClass(), but I just wanted to be sure that I’ll resolve it against real class of direct buffer.

Real buffers used in test are ‘triangles’ and ‘center’

Riven · October 21, 2005, 4:44pm

Hm… hm… hmmmmmm… :o ;D

This benchmark is comparable with a real-world situation.

Benchmark does this:

init 1024 3d vectors
init random 4x4 matrices

for(1024)
   for(field vectors)
      transform by field-matrix

for(1024)
   for(buffer vectors)
      transform by buffer-matrix

for(1024)
   for(unsafe buffer vectors)
      transform by unsafe-buffer-matrix

Performance factor field vs. buffer: 2.4653847
Performance factor field vs. unsafe: 0.60384613

Using unsafe-buffers is 67% faster than using fields…!
_{(1.0/0.6)-1.0 = 67%}

Now these objects can be used to speedup any math operation. We’ll need a good API for it so that it’s guaranteed Safe and anyone can use it.

Unfortunately this only works for the Sun Server VM. (sun.misc.Unsafe)

source-code: structs
source-code: bench

I’ve written code that will generate struct-source-code automaticly, for:

fields
buffers
unsafe buffers

abies · October 21, 2005, 7:46pm

One note - in my previous benchmark, it was the ‘position’ method which made a major difference. Without it, unsafe version was 3-4 times faster !!! Unfortunately, it is needed to be able to reuse same structure wrapper for various positions in same buffer. Having one object per entry in native array is not possible (memory, gc hit).

Riven · October 21, 2005, 7:51pm

My latest implementations is:

class VecC
{
   public static final int SIZEOF = 12;

   private static final Unsafe access = StructUtil.getAccess();

   private final long base;



   public VecC(ByteBuffer bb)
   {
      if (bb.remaining() < SIZEOF)
         throw new IllegalStateException("Not enough bytes remaining in buffer: " + bb.remaining() + "/" + SIZEOF);
      if (bb.order() != ByteOrder.nativeOrder())
         throw new IllegalStateException("ByteBuffer must be in native order");

      int pos = bb.position();
      base = StructUtil.getBase(bb) + pos;
      bb.position(pos + SIZEOF);
   }



   public final void x(float x)   {      access.putFloat(base + 0L, x);   }
   public final void y(float y)   {      access.putFloat(base + 4L, y);   }
   public final void z(float z)   {      access.putFloat(base + 8L, z);   }
   public final float x()   {      return access.getFloat(base + 0L);   }
   public final float y()   {      return access.getFloat(base + 4L);   }
   public final float z()   {      return access.getFloat(base + 8L);   }
}

No need to use position at all outside the constructor…


Edit: okay, I looked through your code, and noticed you didn't mean the ByteBuffer.position()

You basicly use that method to move along the data, and 'land' where you like it, to re-use objects... I don't know if that's such a good idea... Think about this:


```
MappedObject mo = new MappedObject(...)
mo.doSomething();

// "mo" points to other data now
// this is not like anything in java

float x = mo.x();


void doSomething(MappedObject obj)
{
    obj.position(...);
}
```


My implementation is 100% safe, as long as the ByteBuffer is floating around. With MappedObject.position(...) you can wreak havok and cause native crashes. You can't allow that to happen, ever. Checking input here (throwing exceptions) will disable inlining, which is kinda slow compared to non-struct classes.

Riven · October 21, 2005, 8:13pm

[quote]Having one object per entry in native array is not possible (memory, gc hit).
[/quote]
Eden-heap GC is lightning fast. Since Java 1.4 / 1.5 tiny objects (1long) are not really a problem anymore, especially on Server VM. (ok, >1000/s is troublesome)

abies · October 21, 2005, 8:22pm

We are talking here about million objects per second. I can imagine such structure being used to fill out vertex data inside opengl buffers - thousands of dynamic triangles each frame, giving 100-1000 thousands triangles per second. You certainly don’t want to allocate anything to access single vertex. One allocation per array of vertices is probably acceptable, but maybe even switching nio buffers is not too far fetched.

Riven · October 22, 2005, 12:53am

I’m working on an API that supports both my and your kind of structs… will post tomorrow, it’s getting late :-\

Here are the javadocs

AndersDahlberg · October 22, 2005, 11:10am

http://lwjgl.org/forum/viewtopic.php?t=955

Riven · October 22, 2005, 11:19am

First of all, you’re doing this:

ByteBuffer.getFloat() which is very very slow
FloatBuffer.get() is still about 1,5-3x slower than class field access

The best performance you get with Javassist is 25-50% slower than ‘normal’ code. I have to say it’s a nice transparant architechture though, but if you need raw-performance, it’s unacceptable, and when you see that unsafe.getFloat() is about 15-20% faster than class field access, the choice is easy. You’ll lose the transparancy, and have to change your code from fields to method-calls, but I think that’s worth it, for the die-hards.

Second, you’re using Lists in your benchmark, which will significantly influence performance, not to mention Math.sqrt()s in tight loops, which is very heavy, and Random.nextFloat(). Do you really want to measure that too?

princec · October 22, 2005, 12:01pm

Enough of all this crazy hackery! I don’t get why there is such opposition to the ultra-simple idea that MappedObjects are.

Abstract class in java.nio containing final reference to a ByteBuffer
Primitive fields mapped IN ORDER DECLARED, of size specified by the Java specs, no need for annotations
Reference fields held in heap section.
JVM is free to detect classes extending MappedObject and may either optimise directly into machine code, or rewrite bytecode to provide similar but less efficient access by proxy.

Why MappedObjects?

Clean, clear code, with no annotations, no caveats, no getters and setters to pollute OOP designs
High performance: Bounds check performed only ONCE on a setPosition() call
High performance: no need to create or destroy any objects, just use one and slide it around the buffer
High performance: no read-modify-write operations, it’s all direct in memory access
Provides all the benefits of a C-struct but fits seamlessly in with Java’s object-oriented paradigm and behaves just like any other reference type by virtue of being a real object on the heap

Why NOT MappedObjects?

NIH
Inertia at doing anything vaguely out-of-the-box
Misunderstanding about what OOP actually is
No idea why it’s needed

Somebody give me a sound, concrete reason why MappedObjects, as I have described here, do not do everything we need to get us clean, clear, concise, fast, object-oriented, easily implemented, side-effect-free, interfacing with native data.

Cas

Riven · October 22, 2005, 12:15pm

Ofcourse! When Mapped Objects are in Java, I’d ditch my so called hackery immediately. It has quite a few disadvantages to it. No doubt about it.

But… we haven’t got Mapped Objects.

What’s the ETA? Noone knows!

abies · October 22, 2005, 12:17pm

Are you sure you will never need an annotation ? What about accessing data which is coming from network - some fields can be in different endianess. I agree, that default behaviour should not require any annotations - but allowing ones for not-so-trivial cases could simplify usage a lot.

You can get all of the benefits of your MappedObject with my idea of bytecode weaving - with single exception of having to put transformation class in classloader/on startup. It would even allow you to use field access directly, as it would be changed silently to use accessors. And it has a major benefit - it could be used out of the box, right now, without forcing particular construct on rest of world, which is more concerned about JSP v7.0 than native access to resources.

As far as hackery is concerned… it would be all invisible from client point of view, only library implementation would have to use few magic hacks. I vaguely recall certain library passing native pointers as ints/longs between the calls and doing direct pointer arithmetic on them

mthornton · October 22, 2005, 12:30pm

Because it is a change in the language specification. Especially after the less than ecstatic reception of recent changes, I think the chances of such a change being accepted are slim.

You might still want annotations if you wanted to be able to represent C structs that had been packed to boundaries other than 1 byte; i.e. where padding has been inserted to maintain appropriate alignment.

Riven · October 22, 2005, 12:52pm

Do you know how the sliding-window * feature, that you thought was absolutely required, would be implemented with bytecode transformation? I’m very curious.

_{* using 1 object and move it along the data}

abies · October 22, 2005, 1:44pm

public class Vertex3f extends MemoryMappedObject {
public float x;
public float y;
public float y;

public Vertex3f(ByteBuffer bb) {
super(bb);
}

// … some Vertex3f specific methods
}

where MemoryMappedObject would implement position/sliding method, normally visible, without any tricks.

Bytecode weaver would do following:

If something extends MemoryMappedObject, remove the public fields, create correct getters/setters with any magic inside which is needed (depending on implementation), probably also pass SIZEOF as extra argument to super constructor
If something accesses any field from MemoryMappedObject, convert get/putfield to getter/setter calls.

On top of that, I could imagine few extra properties/annotations
a) possibility of explicitly giving sizeof parameter (passing to super constructor, or in annotation which would be weaved to be passed in constructor) - for easy alignment
b) specifying explicit offset of particular field
c) specifying endianess of particular field
d) (optionally, not sure about that, especially about multiple-dimensions) posibility to denote arrays of values, like

public Matrix4f extends MemoryMappedObject {
@Dimension(4,4)
public float[][] data;
}

with calls like matrix.data[x][y] would be automatically converted to matrix.getData(x*4+y)

With enough magic, it could even work on stuff like

@Alignment(128)
public VertexData extends MemoryMappedObject {
@Offset(16) public Color4f rgba;
@Offset(32) public Vector3f position;
@Offset(48) public Vector3f normal;
public float m1,z2,f3;
}

With Color4f and Vector3f being other MMO, expanded inline for VertexData. Alignment/Offset can be replaced with padding elements, so they are not required -I’m just throwing ideas around.

Linuxhippy · October 22, 2005, 2:40pm

So, why do you want to do this task in java at all, add such optimizations to java and we are where we already where with C++.