JNI: statically linked libraries

I’m looking to improve performance of JNI calls. I’ve written a couple of JNI libraries in the past and I always wondered why the performance is so bad compared to the actual performance of the algorithm in C.

According to my experience, native code usually runs about ten to fifty times faster than the same code in Java, given a reasonable complex algorithm. When adding JNI method calls to use the native code from Java, performance gets extremely worse and you loose the advantage of a native library.

On the other hand, native methods provided by the JVM have significantly better performance. I do not remember the actual numbers, but when you compare some of the native methods in sun.misc.Unsafe with native methods of the same functionality provided in a JNI library, you can measure a significant difference in method call performance.

So, the JVM somehow gets better performance on its own statically linked native methods.

Should I consider using statically linked libraries (yes it’s possible, see below)? Does anybody have experience with that?


FURTHER READING

Common pitfalls using JNI
There is a good article on Best practices for using the Java Native Interface, which mentions important points to look at.

Statically linked JNI libraries

There are basically two ways to statically link your own library with the JVM:

  • Implement a JVM launcher, which includes your native code and use the invocation API to instantiate the JVM. So you basically write your own “java” command binary.
  • Get the code of the Java VM and statically link it with your library. Again, you build your own “java” command binary.

See also Section “Static linking” in this IBM article.

1 Like

Did you take a look at Project Panama? I guess it is what you want.

2 Likes

Even though you can potentially statically link native code into the JVM’s executable, calls to those native functions are still JNI and are handled exactly the same way as any other dynamically linked JNI function would. The overhead is not due to the OS’s dynamic linking of the native functions but simply due to the steps involved before and after a JNI function call, like shuffling arguments in registers, reserving a call stack frame, providing a JNIEnv argument, checking for a safepoint, etc.
All JNI calls (also for native functions of the JRE classes) go through this procedure.
Some methods are intrinsified and either native code is inserted into the JIT’ted Java method or a direct non-JNI call to a native function provided by the JVM is done in these cases. The sun.misc.Unsafe methods and lots of java.util.Math methods are examples of intrinsified methods for which the JVM itself (and not the JRE class library) provides the native code for.
Usually, the JRE class library provides native JNI functions for its classes (such as OpenJDK) and those are also JNI calls.

So, even if you could statically link a JNI function into the JVM, that would not at all give any performance benefits.

Also regarding Panama (or rather the foreign-linker API): Last time I checked this is just as fast/slow as JNI.

2 Likes

Interesting discussion.
I just read this article about the topic:


The next question is obviously: can you make your own intrinsified method in a customised rebuilt JVM?
This is a subject that @Riven and @Spasi would enjoy taking part in too.
1 Like

You could, but it’s not practical when you have to support thousands of functions in a library like LWJGL. I’ve pushed for better Critical JNI support in the past, but it didn’t go anywhere. Critical JNI may actually be faster in JDK 16 after JDK-8233343 (haven’t tested yet), but probably not worth bothering with Panama getting closer to release.

The ideal scenario for zero overhead is having 1) the JIT’s register allocator be aware of foreign calls and 2) no GC interactions. Afaik Project Panama is not there yet and performance is comparable to JNI for downcalls. It’s much faster for upcalls though (upcalls are horribly slow with JNI) and that might make using callback-heavy libraries practical (e.g. most physics libraries).

2 Likes

Those APIs are promising but very difficult to use without jextract, I don’t know yet whether it generates efficient code.

2 Likes

Nice educated feedback!

I did some research to learn more about it, this morning.

Getting your own intrinsic to work requires changes to the JVM code, because each intrinsic has to be added to a switch-case statement in function called LibraryCallKit::try_to_inline().
Besides that, you have to write the code, which generates the assembler code to be inlined by the JIT compiler. It might take a while to get this right and this is probably what Spasi meant by: impractical.

Then I wondered whether Panama will actually fix it. I found an email by Maurizio Cimadamore. It basically discusses the mediocre performance of library calls via Project Panama APIs and a promising experimental branch called linkToNative as improvement. Project Panama generally uses a “universal method” to prepare registers, stack, etc., for a native function call. linkToNative instead generates specific stubs for each function once during - let’s call it - “JIT-compile” phase and can thereby achieve better optimisation.

The linkToNative approach is what you would expect, given the dynamic nature of the JIT compiler. But it seems, like it’s not going to happen soon, since

“the linkToNative branch is not ready from prime time (yet)”

and (at least from what I could see) the branch has not been merged into master, so far.

From what I read about Project Panama, they are more focused on API and usability than performance - at least as long as performance is comparable to JNI.

1 Like

Nice work tracking down those source files and experimental branch @Homac. And thanks for the links @Spasi. Great to see your involvement and push for improvements. That’s interesting about jextract too, I’d never heard of it @gouessej.
So interesting to see the cutting edge of performance. What a pity that the issues aren’t yet getting the full attention of the (Oracle?) project Panama developers yet.
I assume it’s because the hotspot compiler code gets exponentially more complex and perhaps slower for normal use-cases when these extra performance-critical JNI tweaks are added. The low-level native code changes probably also open up countless security bugs.
It’s nice that it’s all open source which allows clever enthusiasts and professionals like yourselves and Maurizio Cimadamore (and the coin miners!?) to make suggestions and even code improvements yourselves.
Would be fantastic to see a game-centred performance-tuned JVM flavour one day!

To be honest, there is still some room for performance improvements but the foreign linker API and the foreign memory API are almost already good enough to replace JNI, JNA and JNR. If it was noticeably faster, it would allow to write the whole logic in Java. Currently, I would still advise to avoid going from/to C/Java very often in performance-critical code.