Bound's checks and struct's


class MyStruct {
   private ByteBuffer buffer;

   public void setBuffer(ByteBuffer b) {
      ByteBuffer buf = b.duplicate();  // isolate from changes elsewhere
      buf.clear();
      if (buf.limit() < 32) throw new Exception("too small");
      buf.order(ByteOrder.native());
      buffer = buf;
   }

   public int getX() {return buffer.getInt(0);}
   public int getY() {return buffer.getInt(4);}
   // etc
}

All we need is a compiler clever enough to notice that the bounds checks in buffer.getInt(0) can be eliminated because the buffer limits are checked in the setBuffer method. Furthermore it ought to be possible to deduce that the buffer is in native order. On x86 this ought to be sufficient to allow replacement of the getX() method by a trivial memory load. Some other machines will require further tests for alignment.

I am quite sure it isn’t since that would also include flow-analysis and wouln’t be that easy to do,
I am not 100% sure and do not want to tell you anything wrong, but as far as I know no runtime which provides bounds-checks is able to handle this sort of optimizations (no JVM and no .NET runtime).

lg Clemens

It doesn’t seem much more complex than some of the analysis that is already being done or considered.

I bet it is. Besides, it’s not as nice as solution as mapped objects which are dead easy to implement :wink:

Cas :slight_smile:

Well, if there is doubt why not benchmark it, if its not that important since there are no perforance problems anyway … why worry - in this case its just wasted time.

btw. I bet hotspot isnt able to optimize the checks away - sure it is if you use it in a loop since the method will become inlined and therefore its not much more than normal array-accessing code in a loop - but for random access I bet the bounds checks are not removes.

lg Clemens

I don’t expect the current hotspot compiler to be able to optimise this usage, but it is an alternative that may be worth considering as a solution to Cas’s needs. It also may improve performance for other cases as well.

There is still a question of ByteBuffer being direct or heap-based. There is no way it can be guaranteed with this code, only chance would be to see execution profile - but still, there would need to be a trap for non-common case in addition to simple memory load.

If you view your ByteBuffer as an IntBuffer and load the ints through the IntBuffer then the code will be optimized as you expect. JOGL uses this technique (multiple views of the same ByteBuffer depending on the type of C field being loaded) in its internal StructAccessor class for accessing C structs.

Aye, I use this very technique for accessing complex vertex data in my bumpmapped background stuff in 'Flux etc. but the thing is, it’s such horrendous syntax and requires an absolute ton of bounds checks which are basically impossible to optimise away. And that was when I alighted upon the idea of MappedObjects (“Structs”).

Cas :slight_smile:

As long as the underlying ByteBuffer is appropriately sized there should be no need for additional explicit bounds checks. The only trick is automatically figuring out the offsets for the various fields and generating accessor and setter methods. The GlueGen tool used by JOGL does this automatically for all of the C structs it encounters while parsing the input header files. However this is not necessarily a good solution for all Java programmers given that it requires learning a new tool. Peter Ahe recently discussed 4820062 with me and suggested that it could be implemented in similar fashion via annotation processing with no JVM or language changes, which sounds like a good idea to me.

Annotations will not be sufficient nor necssary; you will need to extend an abstract JDK MappedObject class and there will need to be a bit of jiggery pokery in the system classloader. Would be good to talk to you both in more detail about my designs…?

Cas :slight_smile:

I disagree. Minor syntactic details aside, it is already possible to generate the appropriate Java code to access C structs and arrays of C structs via NIO. Again, JOGL’s GlueGen tool does this. Have you looked at the generated code it produces for the JAWT, XVisualInfo and other C data types? Aside from having to maintain multiple views of the underlying ByteBuffer (which is an implementation detail) the generated machine code is exactly as desired.

Similar Java code could be autogenerated using the annotation processing tool. I don’t know the syntax exactly but I recall Peter mentioning something like this:


  public @Struct class MyCStruct {
    public @int32_t int firstIntField;
    public @float secondFloatField;
    ...
  }

which when processed via APT would produce something like


  public class MyCStruct {
    public MyCStruct(ByteBuffer buf);
    public int firstIntField();
    public void firstIntField(int val);
    public float secondFloatField();
    public void secondFloatField(float val);
  }

which is similar to what GlueGen produces today. APT could similarly generate an associated class representing an array of this C data type. Again, all of this is possible without modifying the language, JVM or libraries.

If you feel this is somehow insufficient then please write up your proposal on a web page somewhere and send around a link.

Do you mean that each setXXX or getXXX method would not do any bounds checks? I believe the goal with the structs proposal is that you essentially end up with something like a “StructBuffer” analogous to an IntBuffer or FloatBuffer, where the index refers to a particular ‘struct’ in the buffer and so once you have a struct you know that all field access’ are within bounds.

I’ll assume that any overhead of using getter/setter methods to peek into the underlying ByteBuffer will be optimized away by HotSpot.

As long as the underlying ByteBuffer is appropriately sized then yes, there is no need for additional bounds checks.

[quote]I’ll assume that any overhead of using getter/setter methods to peek into the underlying ByteBuffer will be optimized away by HotSpot.
[/quote]
As long as you maintain multiple views of the underlying ByteBuffer depending on the type of field you’re fetching out of the struct then yes, HotSpot generates optimal machine code. The getInt(), etc. methods on a ByteBuffer are typically not optimized (and usually can not be due to alignment issues on non-x86 platforms). I would strongly encourage you to check out the JSR-231 branch of the JOGL sources (which will soon be promoted to the main trunk) and look at the StructAccessor class in com.sun.gluegen.runtime. if you build the source tree you can see how the various autogenerated Java-side struct wrappers like JAWT_DrawingSurfaceInfo use that class.

As long as you maintain multiple views of the underlying ByteBuffer depending on the type of field you’re fetching out of the struct then yes, HotSpot generates optimal machine code. The getInt(), etc. methods on a ByteBuffer are typically not optimized (and usually can not be due to alignment issues on non-x86 platforms).
[/quote]
Well the multiple view thing seems like it could get a bit awkward, if you were to try to code this by hand. But overall this sounds quite cool. I will have to check out that gluegen stuff you speak of. Other than looking at the JOGL source, where would I go to learn about those tools?

There isn’t a lot of documentation for GlueGen at the moment. I believe some is embodied in comments in the source code and usage statements but the configuration files used to drive the glue code generation are completely undocumented right now. Documenting it is a medium-priority task but completing JSR-231 takes precedence.

Your proposal does not quite meet the requirements I have.

The precise requirement I have is to map primitive fields, in-order, directly, into a ByteBuffer. No getters or setters. And no bounds checks in the get or set methods, which you will have to have with any generated code, as Hotspot will not be able to determine whether they are within the boundary of the ByteBuffer. And no source code generation.

A MappedObject must have a final ByteBuffer in which it lives, and a current position within that bytebuffer. The ability of the JVM to read and/or write fields efficiently will depend on the byte alignment of the underlying bytebuffer and struct position and field position.

Because MappedObject.setPosition() sets the position and relative position of all the fields simultaneously, this is the only time that a bounds check must be performed on the bytebuffer, to ensure that the first and last fields are within the limits specified by the buffer.

Object reference fields are simply ignored for all purposes.

MappedObject is an abstract class that should be placed in java.nio.

That’s all it needs.

Cas :slight_smile:

Ken seems to be implying that HotSpot can already eliminate the bounds checks and thus achieve the performance part of your requirements.

Code is probably better than words.

What can be done with a bit of hackery


import java.lang.reflect.Field;
import java.nio.Buffer;
import java.nio.ByteBuffer;

import sun.misc.Unsafe;

public class MappedObject {

  static Unsafe unsafe;
  static Field addressHack;
  static {
    try {
      ByteBuffer bb = ByteBuffer.allocateDirect(1);
      Field unsafeHack = bb.getClass().getDeclaredField("unsafe");
      unsafeHack.setAccessible(true);
      unsafe = (Unsafe) unsafeHack.get(bb);

      addressHack = Buffer.class.getDeclaredField("address");
      addressHack.setAccessible(true);
    } catch (Exception exc) {
      exc.printStackTrace();
    }

  }

  private long base;

  public final void attach(ByteBuffer bb, int offset) {
    try {
      base = addressHack.getLong(bb) + offset;
    } catch (Exception exc) {
      exc.printStackTrace();
      throw new InternalError();
    }
  }

  public int getY() {
    return unsafe.getInt(base + 8);
  }

  public void setY(int y) {
    unsafe.putInt(base + 8, y);
  }

  public int getX() {
    return unsafe.getInt(base + 16);
  }

  public void setX(int x) {
    unsafe.putInt(base + 16, x);
  }

  public static void main(String[] argv) {

    MappedObject mo = new MappedObject();
    ByteBuffer bb = ByteBuffer.allocateDirect(1000);
    mo.attach(bb, 128);

    mo.setY(1);
    for (int i = 0; i < 500; i++) {
      test(mo);
    }
    System.out.println(mo.getX());
  }

  private static void test(MappedObject mo) {
    for (int i = 0; i < 1000000; i++) {
      mo.setX(mo.getY() + mo.getX());
    }
  }

}

And this is how inner stride of loop inside test method looks like


040   B4: #	B4 B5 &lt;- B3 B4 	Loop: B4-B4 inner stride: not constant  Freq: 7.51066
040   	MOV    EDI,[EAX + #8]
043   	ADD    ECX,EDI
045   	MOV    [EAX + #16],ECX
048   	INC    EBX
049   	CMP    EBX,#1000000
04f   	Jlt,s  B4  P=1.000000 C=7.509333

When playing with this code, I have seen once that hotspot unrolled this inner loop by 4, but I was not able to reproduce it, for whatever reasons.

I think it clearly shows, that it is perfectly possible to write well performing struct class using today jvm. Only problem is that you have to use getter/setter instead of direct field access - which can be a clarity issue, but doesn’t affect the performance in slightest way.

MappedObject class would have to be made abstract, with subclasses for specific structures. It also requires full rights in jvm and performs no checks currently - but a adding check in attach would be exactly what you mentioned and should not affect the performance.

I would dearly like to be able to remove get/set methods for the purposes of clarity. Consider writing a dot product for a mapped vector3f class - yuk. Besides not all of the fields should necessarily be exposed! This is OOP remember. get/set is a crap paradigm.

Cas :slight_smile: