Optimised direct Buffers

chaosdeathfish · November 9, 2005, 3:29am

From reading the various posts in this forum it’s clear that puts and gets on some direct buffer types are converted to direct memory access instructions (at least on the server VM), and so don’t suffer any JNI overhead. Apparently IntBuffer is accelerated, but ByteBuffer isn’t - is that correct? And what about other types - in particular, is FloatBuffer accelerated?

princec · November 9, 2005, 10:36am

Yes.

Cas

chaosdeathfish · November 9, 2005, 5:18pm

I assume you mean yes, IntBuffer and FloatBuffer are accelerated but ByteBuffer isn’t? (I don’t really use ByteBuffer directly much anyway, so not a problem for me).

darkprophet · November 9, 2005, 7:39pm

IIRC, I thought .asIntBuffer() and asFloatBuffer() operations on ByteBuffer only create a “view” of the ByteBuffer to allow you to put ints and floats in?

Thats what the javadoc says anyway :

Jeff · November 9, 2005, 9:08pm

FWIU Native Direct Byte Buffers should be optimised. I believe they are the base structure for reaching native memory.

kbr · November 11, 2005, 2:33am

All of the direct buffer classes are optimized. However ByteBuffer has both the heap-based and direct buffer subclasses loaded all the time because the core libraries use heap-based ByteBuffers. This caused calls to ByteBuffer.get() and put() to become virtual calls rather than simple machine instructions. Before Mustang HotSpot wasn’t optimizing these virtual calls well, though Mustang should do better. You can generally work around this problem by downcasting your direct ByteBuffers to MappedByteBuffer outside your inner loops and operating only on MappedByteBuffer in your loops.

chaosdeathfish · November 11, 2005, 12:48pm

I’m using FloatBuffers (from ByteBuffer.asFloatBuffer), so I can’t downcast to MappedByteBuffer. I’ll give Mustang a go and see how fast it is. Does this mean that non-direct FloatBuffers are faster in the meantime?

Thanks

kbr · November 12, 2005, 1:56am

No, direct FloatBuffers should be very fast. The VertexArrayRange JOGL demo shows that you can get faster-than-C speed out of HotSpot when using direct FloatBuffers due to HotSpot’s ability to generate SSE instructions at run time; most x86 executables still target the x87 FPU or have separate SSE and x87 code. Mixing direct and non-direct buffers in your application is discouraged.

chaosdeathfish · November 14, 2005, 7:50pm

That’s what I was hoping to hear

One question though - how do these optimisations interact with debuggers/profilers when using JVMTI? I’ve wondered in the past if profiling my application skews the results…

kbr · November 15, 2005, 12:06am

Most profilers impose some amount of overhead, though if the profiler is reasonably modern it shouldn’t perturb the code generated by HotSpot. The -Xprof flat profiler built in to HotSpot used to be a good low-impact profiler, but in 1.5 its implementation was changed and now it’s unusably slow. I think the NetBeans profiler is currently very good. When running under the debugger, all of the code is optimized normally by HotSpot except that in which a breakpoint has been set.

Tomas · November 15, 2005, 7:57am

In general or just for JNI stuff ?

// Tomas

kbr · November 16, 2005, 4:59pm

In general. This will become less of an issue going forward, as the server compiler can now perform efficient bimorphic call inlining and as we are heading toward a tiered system, but can currently impact performance-critical code.

blahblahblahh · November 16, 2005, 6:07pm

You can say that again :(. One reason that JGF is down so often is that I physically cannot profile it to find out where the memory leak is. Life’s kind of hard with no profilers.

grumblemumblecallthisaproductionrelease?mumblegrumble

kbr · November 16, 2005, 10:59pm

On the contrary, there are plenty of commercially available CPU and memory profilers which work just fine. I’ve personally used JProfiler with good results, and in the past have also successfully used OptimizeIt and a couple of others. It’s only HotSpot’s built-in (and unspecified) profiler which has this drastic slowdown, though I agree this is a problem which needs to be fixed. For a free memory leak profiler I’d recommend the NetBeans profiler which I think works pretty well.

blahblahblahh · November 17, 2005, 5:55pm

Not criticising you personally, but just to answer some of your points there, which I think are unfairly glossing over some serious issues:

Java ethos, platform and tools: you can get it for free. Saying “you have to pay for one of the tools you need to do java development” sits uncomfortably with that. Profiling is a fundamental platform feature: it needs to be available for free.
Netbeans profiling of a remote server isn’t easy, whereas with -Xprof it was identical to profiling a local app. Getting a customer to profile with -Xprof was exceptionally easy; getting them to do it with netbeans? Ouch.
Replacing a core tool with an implementation that is only available in Sun’s own IDE scares a lot of people. Right or wrong, most people don’t use that IDE; can they expect more of the core tools to migrate to this foreign IDE in the future? Will netbeans gradually become the “only” java IDE?

kbr · November 17, 2005, 11:29pm

To the best of my knowledge, the Netbeans profiler doesn’t use -Xprof, it uses dynamic bytecode instrumentation to insert probe points in subsets of the application. There is a free profiler which ships with the JDK, hprof, which works reasonably well (and better in the current version of Mustang than in previous releases).