DirectByteBuffers slow???

rreyelts · May 18, 2004, 6:54pm

Hi all,

So here’s the situation.

I have a network of objects stored as a few arrays in a C++ JNI library. I expose that network data to my Java application through DirectByteBuffers. I wrap the ByteBuffers up in a class with a flyweight pattern, like:

public class Network {
  private ByteBuffer nodes;
  public class Node {
    private int index;
    private int offset = NODE_SIZE + index;
    public int index( int index ) {
      this.index = index;
    }
    public float x() {
      return nodes.getFloat( offset + X_OFFSET );
    }
    public float y() {
      return nodes.getFloat( offset + Y_OFFSET );
    }
  }
}

Life was good. Or so I thought.

The problem I’m seeing, is that the access to the buffers seems very slow. I’m accessing, on average, about 50K nodes, and my times can run up into the seconds. Anybody seen this before?

I’m running with JDK 1.5 Beta 1, with -server and more than enough memory allocated. I give the server plenty of time to warm up before taking measurements.

God bless,
-Toby Reyelts

princec · May 19, 2004, 7:36am

And now you know why we need Structs in the language. They are entirely essential to high-performance programming (But you knew that anyway, hehe!)

Actually the problem is well known: calling getFloat()/putFloat() on a ByteBuffer is bastardly slow - but if you wrap your ByteBuffer in several sliced up FloatBuffers etc. then you get full speed optimized access. It’s so ugly and unnecessary it makes me cry. Shoot the Sun engineers for me. Still no comment about the RFE from Sun, no-one’s emailed me about it, not a squeak.

Cas

mthornton · May 19, 2004, 8:48am

Also watch out for your byte ordering. ByteBuffer.duplicate and ByteBuffer.slice do not preserve the byte ordering.

getFloat on a ByteBuffer is probably difficult to accelerate as it must check the byte ordering each time. The wrapped FloatBuffer on the other hand has the byte ordering fixed at the time the buffer is created. In addition some architectures don’t permit native reading of floats (or other objects larger than one byte) at arbitrary offsets but require alignment. Again a suitable FloatBuffer implementation can be created to reflect the alignment, but this may not be possible when reading floats direct from a ByteBuffer.

mthornton · May 19, 2004, 9:02am

Incidentally Cas, have you considered the implications of alignment issues for accelerating your Struct proposal? While this may not matter much on Intel architecture, you can expect Sun to have a rather different view of the problem.

princec · May 19, 2004, 10:06am

When all the planets and stars are in place and the tripes are heavy etc. the JVM understands when it can safely use a struct in a buffer without worrying about alignments. Failing that it can fallback to spoofing the whole thing by autogenerating a wrapper class.

Cas

swpalmer · May 19, 2004, 11:51am

[quote]Actually the problem is well known: calling getFloat()/putFloat() on a ByteBuffer is bastardly slow - but if you wrap your ByteBuffer in several sliced up FloatBuffers etc. then you get full speed optimized access.
[/quote]
Is there a bug number for this performance issue?

Herkules · May 19, 2004, 1:05pm

does not sound like a bug?

princec · May 19, 2004, 2:39pm

It’s not a bug, it’s just the way the world works, and you can’t expect the VM to optimize it away because it’s far too generic.

Structs! Structs! Structs! Structs! Structs!

Cas

swpalmer · May 19, 2004, 5:08pm

I don’t follow. Why would you expect getFloat() to be so much slower on a ByteBuffer than a FloatBuffer when ultimately the exact same work must be done? Assuming of course that the floats are aligned in memory the same in both cases…

abies · May 19, 2004, 5:47pm

Not exactly the same work. Given FloatBuffer will be always aligned or always unaligned. In case of aligned buffer, native fetch can be directly generated by Hotspot, thus requiring no code in the middle (if bounds checks are already done). With ByteBuffer, every getFloat call needs to check if base address plus given offset gives aligned or unaligned result. This extra branch is quite a big cost compared to simple memory fetch.

In worst case, all paths would require the same work. Fortunately, if Hotspot detects that given FloatBuffer is aligned and in native order it can optimize access code in dramatic way.

princec · May 19, 2004, 7:01pm

Which is why Structs would r0xor so much. Help the VM optimise something that it can’t figure out on its own. Save the world! Get the girl!*

Cas

and be memory efficient with it!

shawnkendall · May 19, 2004, 7:24pm

Structs are a year+ and counting and no response from JDK reps…

princec · May 19, 2004, 8:58pm

Mmm. Despite my relatively frequent direct and indirect communications with GTG members, too.

Cas

JasonB · May 19, 2004, 11:16pm

[quote]Mmm. Despite my relatively frequent direct and indirect communications with GTG members, too.
[/quote]
Ahem.

What a shame Java’s not open source so it could be a community effort to add structs. ;D

Sorry. Couldn’t resist.