Examples of high JNI performance overhead?

Hello,

We have folks working on reducing JNI overhead, so they need something
to test their work on, so we’re looking for benchmarks (or better yet, real
applications) which could show high JNI overhead.

It would be especially useful if you could quantitize the amount
of overhead by some measurements.

We in 2D typically do this by rendering some tiny primitives which
don’t take any measurable time at all by themselves, so that most
of what’s the time is spent on is overhead. This is for native method
calling.

Other example is pulling data from java objects.

We have a couple of micro-benchmarks (internal and one external),
but it would be useful to have some real-world applications.

Typical Swing applications have too much stuff going on on java level,
so the JNI overhead (for rendering calls, for example), is almost
non-noticeable. Games, on the other hand, may be more
prone to the overhead.

Thanks,
Dmitri
Java2D Team

Because of the way we use the APIs, JNI overhead is just totally lost in the noise. Might not really be worth it for you to optimise this too much more.

Cas :slight_smile:

I remember a few months ago when me and Riven were doing a SIMD library, we benchmarked the JNI overhead to be 1100ns on 6.0 (that was an empty method).

Generally, if things are in ns time, i just ignore them. Theres always something taking more time…

DP

Maybe jake2 would be a good candidate? IIRC, quake2 is quite old-school in the way it uses OpenGL, so I expect lots of OpenGL calls through JNI. Not sure though, maybe the jake2 author can confirm.

Yeah, but it would be optimising really badly written code, if you see what I mean. Sorta pointless. The OpenGL API is designed to work fast even with big overheads if you simply use it right in the first place.

Cas :slight_smile:

I see your point, but quake2 isn’t necessarily badly written. It’s just old legacy code from the days where you didn’t have much of a choice than to write it like that in OpenGL. As a benchmark, it could still be usable as a ‘real world’ application, even if it’s old.

That said, I’d probably just create some microbenchmarks, and getting data from java objects through JNI is probably much more in need of some optimization. If you want to access java for some AFFT (Amazingly Fast Fourier Transform) from your C program, for example :wink: ;D

In general, when I’m looking for speed involving JNI, I cross the Java/native boundary using DirectByteBuffers. When i’m looking for convenience, I just use Jace to proxy my objects back and forth. If I had any advice for speeding up JNI access, it would be to sort out all the performance issues regarding usage of Buffers.

  1. Fix the problems with polymorphism. I seem to remember that there were significant performance issues once you use more than one type of Buffer. The advice was something along the lines of “only use 1 Buffer type in your app”, and “cast to MappedByteBuffer”. In any case, it needs to be fixed.

  2. Fix bound checks or w/e it is that slows down Buffer accesses. This board has several little benchmarks laying around the site which show that people can get significant speedups using sun.misc.Unsafe in comparison to going through the Buffer interface.

  3. Fix access to heterogenous data. Rather than having some homogeneous array of bytes, floats, or ints, my data is usually structural in nature. For example,

struct {
byte b1;
byte b2;
int i1;
int i2;
float f1;
float f2;
byte[] bytes;
}

This doesn’t lend itself well to optimized access, because you typically end up using ByteBuffer, which doesn’t perform as well as simple mem accesses should.

I fully agree with the previous post.

I will just add a note on this point ;

For me there has allways been a strange design in Java library which exception handling ; external error and internal error are alltreated as exceptions. From my point of view external errors are exceptions (IO, database, …), internal errors should stay at the development level and hence are assertions. ‘assert’ has been introduce since java 1.4. but does not seems to be used extensively in the core library. I think this could be very usefull especially for time critical section like NIO buffer accesses.

Vincent

Structs! Structs! Structs!

[quote]For me there has allways been a strange design in Java library which exception handling ; external error and internal error are alltreated as exceptions. From my point of view external errors are exceptions (IO, database, …), internal errors should stay at the development level and hence are assertions. ‘assert’ has been introduce since java 1.4. but does not seems to be used extensively in the core library. I think this could be very usefull especially for time critical section like NIO buffer accesses.
[/quote]
Perhaps, but you’ll never be able to fully elimenate bounds checks for fear of all manner of buffer overrun attacks and undefined behaviour.

I didn’t want to bang on about structs (well, “mapped data”), but yeah, that’s what he’s talking about there :slight_smile:

Cas :slight_smile:

I guess the fact that JNI was quite slow for a tight C/Java coupling is the reason why no JNI limited real world programs exist.
Almost everybody who uses JNI knows about its performance impact, so everybody starts his/her design with this limitations in mind.

lg Clemens

Btw. great that someone is working on this :slight_smile:

does any of this have to do with “blackbird” ? http://www.java-gaming.org/forums/index.php?topic=6939.0

DP

I think its just optimizing, I happened upon this today: http://blogs.sun.com/fatcatair/entry/micro_benchmarks_just_say_no.

Apparently Steve Goldman has found a way to reduce JNI overhead by 15%.

Keith

Yep, Steve and others in hs team are looking into improving
jni performance. We (Java2D) have been whining about it
for a long time, so they finally have some time they can spend on this…

Dmitri

looks like a case where JNI overhead is significant:
http://forums.java.net/jive/thread.jspa?threadID=18620&tstart=0

All of the floating point to integer conversions and vice versa are intrinsified by the Java HotSpot VM’s compilers, so that native code and any associated overhead is not used.

As I’m using direct ByteBuffers for my networking app, I’m wondering if their “put” method is intrinsified? If not, why not?

As far as I remember I’ve heard its intrinsified.