Cache locality and LibStruct - fixing Java's horrible memory layout

For the last years and to this day, Escape Analysis only worked in extremely trivial use cases, which makes it pretty much guaranteed that it won’t work where it’s most needed. Further, in real life it just happens that you need to put objects in a data structure, like an array or a list. That pretty much rules out EA for the next two decades.

Anyway, LibStruct is still very immature, there are lots of optimizations left to do, bugs to fix, a few design choices to work out (compound structs, which are currently implemented through ‘views’) and the control flow analysis could be better, as it currently gets rather confused by switch statements, which is why the structs version of the demo uses a slow if/else chain in the performance critical section - yet still is faster.

theagentd: Could you post a high-level description of what’s actually going on under the hood on libStruct? I’m guessing you’re generating bytecode to do some of this? What does that do for portability (I’m thinking Android and libGDX/iPhone platforms)?

LibStruct is developed by me - theagentd plays around with it, yelling at me when it breaks again (and again). I’m his main cause of hair loss these days.

Portability wise, you’re bound to the all powerful HotSpot VM.

Ah, I didn’t realise that, sorry :S

Anyway, could you provide a bit of an overview of what it’s doing under the hood? Or point me somewhere about it? I remember reading your posts about such hackery a while back but didn’t realise you turned it into a proper working thing. :slight_smile:

The idea behind LibStruct is to get rid of any instance related behaviour of a type, including all non-static methods.

Original source code:


public class Vec3 {
   public float x, y, z;

   public Vec3() {
      x = y = z = 1337;
   }

   public void load(float x, float y, float z) {
      this.x = x;
      this.y = y;
      this.z = z;
   }

   public void load(Vec3 xyz) {
      this.x = xyz.x;
      this.y = xyz.y;
      this.z = xyz.z;
   }
}

public void run() {
   Vec3 pos = new Vec3();

   Vec3 vel = new Vec3();
}

Rewritten code:


public class Vec3 {

   public static void _<init>_(int _this) {
      fput(_this, 0, 1337);
      fput(_this, 4, 1337);
      fput(_this, 8, 1337);
   }

   public static void load(int _this, float x, float y, float z) {
      fput(_this, 0, x);
      fput(_this, 4, y);
      fput(_this, 8, z);
   }

   public static void load(int _this, int _xyz) {
      fput(_this, 0, fget(_xyz, 0));
      fput(_this, 4, fget(_xyz, 4));
      fput(_this, 8, fget(_xyz, 8));
   }
}

public void run() {
   StructAllocationStack _sas = StructEnv.getFastThreadLocalStack();
   _sas.save();

   int pos = StructEnv.allocate(_sas, 12); // sizeof(Vec3) == 12 bytes
   Vec3._<init>_(pos); // run specified constructor, which is now a static method

   int vel = StructEnv.allocate(_sas, 12);
   Vec3._<init>_(vel);

   _sas.restore();
}

The int is a pointer with the lowest 2 bits cut off (they are known to be zeroes as structs are 4 byte aligned).
This allows addressing up to 16GB of memory using LibStruct, instead of the 4GB you’d expect from 32 bit integers.

Riven, I exceeded my maximum PM limit so I can’t respond anymore. :frowning:

You chatty bastard. Changing the PM rate limit will be quite an adventure, so why not post your rant in pastebin? :slight_smile: