DirectByteBuffers slow???

Hi all,

So here’s the situation.

I have a network of objects stored as a few arrays in a C++ JNI library. I expose that network data to my Java application through DirectByteBuffers. I wrap the ByteBuffers up in a class with a flyweight pattern, like:

public class Network {
  private ByteBuffer nodes;
  public class Node {
    private int index;
    private int offset = NODE_SIZE + index;
    public int index( int index ) {
      this.index = index;
    }
    public float x() {
      return nodes.getFloat( offset + X_OFFSET );
    }
    public float y() {
      return nodes.getFloat( offset + Y_OFFSET );
    }
  }
}

Life was good. Or so I thought.

The problem I’m seeing, is that the access to the buffers seems very slow. I’m accessing, on average, about 50K nodes, and my times can run up into the seconds. Anybody seen this before?

I’m running with JDK 1.5 Beta 1, with -server and more than enough memory allocated. I give the server plenty of time to warm up before taking measurements.

God bless,
-Toby Reyelts

And now you know why we need Structs in the language. They are entirely essential to high-performance programming :wink: (But you knew that anyway, hehe!)

Actually the problem is well known: calling getFloat()/putFloat() on a ByteBuffer is bastardly slow - but if you wrap your ByteBuffer in several sliced up FloatBuffers etc. then you get full speed optimized access. It’s so ugly and unnecessary it makes me cry. Shoot the Sun engineers for me. Still no comment about the RFE from Sun, no-one’s emailed me about it, not a squeak.

Cas :slight_smile:

Also watch out for your byte ordering. ByteBuffer.duplicate and ByteBuffer.slice do not preserve the byte ordering.

getFloat on a ByteBuffer is probably difficult to accelerate as it must check the byte ordering each time. The wrapped FloatBuffer on the other hand has the byte ordering fixed at the time the buffer is created. In addition some architectures don’t permit native reading of floats (or other objects larger than one byte) at arbitrary offsets but require alignment. Again a suitable FloatBuffer implementation can be created to reflect the alignment, but this may not be possible when reading floats direct from a ByteBuffer.

Incidentally Cas, have you considered the implications of alignment issues for accelerating your Struct proposal? While this may not matter much on Intel architecture, you can expect Sun to have a rather different view of the problem.

When all the planets and stars are in place and the tripes are heavy etc. the JVM understands when it can safely use a struct in a buffer without worrying about alignments. Failing that it can fallback to spoofing the whole thing by autogenerating a wrapper class.

Cas :slight_smile:

[quote]Actually the problem is well known: calling getFloat()/putFloat() on a ByteBuffer is bastardly slow - but if you wrap your ByteBuffer in several sliced up FloatBuffers etc. then you get full speed optimized access.
[/quote]
Is there a bug number for this performance issue?

does not sound like a bug?

It’s not a bug, it’s just the way the world works, and you can’t expect the VM to optimize it away because it’s far too generic.

Structs! Structs! Structs! Structs! Structs!

Cas :slight_smile:

I don’t follow. Why would you expect getFloat() to be so much slower on a ByteBuffer than a FloatBuffer when ultimately the exact same work must be done? Assuming of course that the floats are aligned in memory the same in both cases…

Not exactly the same work. Given FloatBuffer will be always aligned or always unaligned. In case of aligned buffer, native fetch can be directly generated by Hotspot, thus requiring no code in the middle (if bounds checks are already done). With ByteBuffer, every getFloat call needs to check if base address plus given offset gives aligned or unaligned result. This extra branch is quite a big cost compared to simple memory fetch.

In worst case, all paths would require the same work. Fortunately, if Hotspot detects that given FloatBuffer is aligned and in native order it can optimize access code in dramatic way.

Which is why Structs would r0xor so much. Help the VM optimise something that it can’t figure out on its own. Save the world! Get the girl!*

Cas :slight_smile:

  • and be memory efficient with it!

Structs are a year+ and counting and no response from JDK reps…

Mmm. Despite my relatively frequent direct and indirect communications with GTG members, too.

Cas :slight_smile:

[quote]Mmm. Despite my relatively frequent direct and indirect communications with GTG members, too.
[/quote]
Ahem.

What a shame Java’s not open source so it could be a community effort to add structs. ;D

Sorry. Couldn’t resist.