JNI Overhead Question

mstevano · May 15, 2006, 6:55am

Hi,

i need your help concerning JNI performance. I am currently designing an API for my engine and i am wondering
what are the best choices for using JNI especially for efficiency and performance reasons. I want to pass values to a
c++ function using JNI. In this example i have 4 float values.

Now the question is , what will be faster and more efficient, passing four single floats or a direct FloatBuffer of size 4. Usually
the FloatBuffer should have better performance, but i have a function call overhead with 7 or more calls

Here is an example of the two solutions

Solution 1)

public native void setColor(float r, float g, float b, float a);

Solution 2)

public void setColor(float r, float g, float b, float a){
floatBuffer.clear();
floatBuffer.put®.put(g).put(b).put(a);
floatBuffer.flip();
setColor(floatBuffer);
}

public native void setColor(FloatBuffer buf);

Now is the question, is solution 2 faster than solution 1. I don’t know what the performance difference is between passing 4 float values through JNI
and passing a FloatBuffer including 7 function calls. The function calls are clear , 4x put(), flip and the call to the native method. I don’t know if the function
call overhead in solution 2 is nullifying the performance gain using a direct FloatBuffer or if it is still better than passing 4 single floats.

Can someone please tell me, if solution is still faster and more efficient than using solution 1 or does it not matter which solution i use.

i really would appreciate some help,

thanks
Mike

Riven · May 15, 2006, 8:02am

This is more performant:

// ensure pos=0, lim=4 (somewhere else, and never modify it)
floatBuffer.put(0, r);
floatBuffer.put(1, g);
floatBuffer.put(2, b);
floatBuffer.put(3, a);

The FloatBuffer is only realy useful when sending lots of data, as it prevents the java<->JNI datacopy.
The copy of 4 floats is probably faster than putting them into a FloatBuffer and sending that object (holding a pointer) across.

To be short:
try both, profile it

mstevano · May 15, 2006, 9:05am

Thanks for your quick answer Riven.

And thanks for your tip with the buffers. Concerning that buffer tip
I have a class that uses a temporary buffer for all native calls. It will be created only once and
will be reused every time.

Is it Ok if i do this:

// in the constructor i create the buffer

public ColorBlending(){
tempBuffer = BufferUtil.createFloatBuffer(4);
tempBuffer.position(0);
tempBuffer.limit(4);
}
…

And in my method i use your tip
…

floatBuffer.put(0, r);
floatBuffer.put(1, g);
floatBuffer.put(2, b);
floatBuffer.put(3, a);

Are there maybe more things to consider? Will the position always stay at 0 and the limit at 4?
And as i understand, i just overwrite the values with every call

I am going to use those kind of methods to call opengl functions.

Do you know what would happen, if i pass the buffer (4 elements) to glColor3fv(const GLfloat *v) which only needs 3 elements. Do i need to set the limit to 3 or would
the opengl call ignore the 4th element?

Because i like to keep one temporary buffer , and maybe as big as needed, that i create only once in the constructor and reuse that buffer for every call i make to opengl.

That’s the reason for my original question. Opengl offers many functions with array calls, so perfect for Buffers. So if the Buffer call would be faster and better than
a call with 4 or more floats , i wil only use the buffer versions.

But i guess i need to try it out and profile it to see which one is better.

thanks

princec · May 15, 2006, 9:45am

What with the various native libraries around that already exist, what is it that you are using JNI for these days in a game engine for anyway?

Cas

mstevano · May 15, 2006, 10:27am

@princec
I know that there a good libraries out there, especially LWJGL. Coming from a LWJGL background , i thought
about using it. I even considered extending it fo my special needs. For example there are some extensions missing
that i need ( i.e. ATI_text_fragment_shader and eventual APPLE extensions ). Cause i really like LWJGL and i often considered
it and changed back and forth.

But i don’t need a 100% binding to Opengl and i even encapsulate many ogl calls in one native method. I just need a subset of opengl.
That may sound like i am reinventing the wheel, it may be, but i also learn stuff and i want to use a different approach to design my library/engine.

Using only Java 5 (enums, generics,…) , the Spring framework , jaxb2 and maybe hibernate, i wanted to try a design approach i have in my mind for the engine and the tools like eclipse plugins (Cg,GLSL,ARB program ,…) and standalone tools. It may turn out that this approach will not work. But i want to try it once.

I thought about using LWJGL and changing the source for my needs, but the source is rather complex and i don’t want to mess with it.
But It may turn out though, that i am going to use LWJGL and just implement the stuff i need as an extra library.

There is one question i want to ask you. I often look at the lwjgl code you guys implemented, and i was always wondering, why you always get the function pointer in java
and pass it to the native method. Isn’t that overhead for every function call? Is there a reason behind. Of course there must be , i may just ask a stupid question right now

I am just curious, cause there are libraries out there like glee that handles extension usage for linux and windows. On the mac i don’t need it if i am using only tiger… I was considering using it. for those other os’s. But i would really appreciate your opinion.
Is it just stupid to reinvent the wheel (cause i like lwjgl) and having the bad thoughts about eventual overhead which aren’t there maybe. And like i said , the missing funcionality i
could implement seperately. Should i stick with LWJGL (cause you are doing such a great job) , cause that would save me some time.

i would appreciate any suggestions on that

thanks
Mike

Spasi · May 15, 2006, 11:01am

Hi Mike,

[quote=“mstevano,post:5,topic:27317”]
Could you reply with a complete list of the extensions you need? I’ll have them ready for version 1.0, but keep in mind that we generally don’t want to add OS-specific or obsolete extensions that noone will use. ATI_text_fragment_shader is one of them, but it’s not a big deal if you really need it.

mstevano · May 15, 2006, 11:45am

Spasi,

Actually at the moment , the ATI_text_fragment_shader is the
the only one i need right now, cause my old iBook and MacMini only have ARB_vertex_program but no ARB_fragment_ program. Only
the ATI_text_fragment_shader is available. On my Intel iMac it’s no problem (It’s a pity Apple didn’t put Nvidia cards in the new macs). It has all the
new extensions.

Usually i want to stick with the Standards on Windows and Linnux but on the Mac, i don’t if it would be better to use APPLE extensions, escpecially
on the older machines, to get more performance out of them. But i am not sure about them, it was just a thought.

One thing i am missing a bit is Cg support. I know that OpenGL has GLSL and the ARB vertex/fragment programs, but it would be cool to have Cg optional availabe, maybe as an extra jar file or so. Then i could have different fallback scenarios in my effect files if certain features are available. I could choose between ffp, ARB, GLSL or Cg.

When will LWJGL 1.0 be out?

Concerning my last posts, i really thought about all of it, and i really think going with LWJGL is the best option. I followed your progress for some years now , and trying
to do something similar would be just a waste of time. I also thought that , when some of my tools, 3d format and effect libs are finished, that i can give something back to the LWJGL community.

Even, if you need some Mac Testers, i can help out if you want. I have a new Intel iMac and an ppc MacMini/iBook to test LWJGL on.
I read in some previous posts, that you need to testers for mac os, so i can help if you want. Or maybe there are other ways i can help.

I look into the extension stuff and if i need something, i post to the LWJGL forum about it

thanks
Mike

Orangy_Tang · May 15, 2006, 12:26pm

I’ve only ever found JNI overhead to be noticable when drawing lots of stuff in immediate mode (which you shouldn’t be doing for good performance anyway). For anything else you usually end up with only a handful of GL calls to draw huge chunks of geometry. Have you actually found JNI overhead to be a problem or is this just a guess?

mstevano · May 15, 2006, 12:49pm

Actually it was only a guess. ;D I know profile first ;D

The original intend was to find out how to best work with buffers, and that sending buffers through JNI is a fast option. I was just wondering
if sending four floats as parameters or a single buffer will be faster. Cause if i use a temporary buffer which is created only once and then reused,
i just need to fill it and send it through.

When i saw the LWJGL source code, i was just wondering why before every function call the reference to the function pointer is fetched, buffer checks are made , and
then the native function is called. I just thought that this was unnecessaray overhead, because you are doing it before every call. But there is
definitely a legit reason behind it , but one that i do’nt know of. Just a stupid question, but wouldn’t it be easier to integrate the glee lib?

Mike

Matzon · May 15, 2006, 1:04pm

if youd dont want the bufferchecks and other sanity checks, you can just genereate the source code without! - check the doc/generator.txt document

darkprophet · May 15, 2006, 1:12pm

From personal experience, JNI overhead is 1100ns on 1.6b62 on windows XP using a P4. So having 1000 JNI calls results in the time taken just for the JNI overhead to be 1ms, which isn’t bad at all. Your milage may vary

DP

Linuxhippy · May 15, 2006, 1:32pm

I’ve read many articles mentioning that JNI overhead is about ~320 cycles for a P4 and ~130 cycles for Athlon64 / Pentium M.
However these benchmarked methods with no parameters so I can’t talk about passed arguments but I guess they won’t make much difference since the most expensive operation for modern CPUs is the indirect method-call + method adress hashtable fetch + the tons of runtime checks.

lg Clemens

mstevano · May 15, 2006, 2:59pm

@Matzon
Thanxs , that’s exactly what i wanted. I Just need to figure out how to do it now and have a look at the readme. That’s the first thing i am going
to try now.

@DP
With the JNI overhead i really just need to wait and see , and profile my app. But as you said, maybe it isn’t that big a deal

@Linuxhippy
With indirect calls you mean like interface calls?
What is the best practice for games anyways concerning interfaces?
Should i avoid them where i can?
Isn’t a good system design more important or doesn’t it apply to games?

Mike

Matzon · May 15, 2006, 5:35pm

consider joining irc://irc.freenode.net/lwjgl - if elias is around (often is) he’s your best bet, since he implemented the generator

Linuxhippy · May 15, 2006, 6:09pm

Not really. Compared to traditional C++ compilers, the JITs (at least server-compiler, after distributing tired-comp builds all jvms) are very advanced in optimizing away OOP overhead and therefor calls to interfaces won’t hurt (much).
Sure it depends and if its really a very hot spot it may be worth looking at it but in general I would recommend just forgeting about virtual calls

If you’re interrested in an old benchmark-applet which gives the JVM the opportunity to remove redundant stuff, here it is: http://www.javaworld.com/javaworld/jw-09-1998/speed/Applet/index.html

This is quite different when it comes down to JNI. JNI code can’t be inlined and everything happening inside of JNI can be called more or less “evil”, therefor call-overhead is a lot larger.
I have to admit that I only read about point 1 and 2 and therefor can’t proove it. It would be really great if one of the jvm devs could comment on it (since I am also really interrested in this).
1.) The adress of the function in the native library is contained in a hashtable, and is resolved at call-time
2.) The cpu has to jump to this adress which will cause pipeline-stalls and maybe memory reads (don’t forget a stall costs 30 cycles on a p4-prescott)
3.) The arguments are copied
4.) I don’t know what else but for sure a lot

lg Clemens

mstevano · May 16, 2006, 6:23am

@matzon
thanks for the tip. But It may sound stupid, i have never done irc before. What do i need to do?

@Linuxhippy
that’s what i thought too. The Vm handles that kind of stuff differently and better, and you can’t really compare it with c++.
I think that programming games in java is different to c++ anyways, and that many problems c++ programmers
face just don’t apply to java gaming. And in java we have so many possibilities to program better software and use different
methods than in c++.

Mike

elias · May 16, 2006, 11:47am

You can’t disable all checks in LWJGL at the moment (you can disable the multi-context/multi-thread safe code which will probably speed it up slightly). It won’t be hard to implement a no-checking option, but I humbly suggest to focus on the engine instead of JNI overhead. The overhead won’t be a problem for most cases, and even if it shows up as a problem you’re not stuck at all:

You can take any LWJGL program and “convert” it to a use a specialized renderer that replaces chunks of OGL calls to larger JNI calls. This is your initial idea, but applied much later in the process.
Get some concrete profiling data showing a problem and we’ll happily help you disable checks and what else that causes a hot spot in the profile.

elias

mstevano · May 16, 2006, 12:16pm

Elias,

thanks for the tips.

Yeah, you are right. I am really going to focus on implementing the functionality my engine needs right now. And if i see someday , that i can convert some chunks of OGL,
i still can do it in a much later phase. It is really more important that i am going to focus on things like implementing my model, texture and effect formats or get all the specifc engine details done so that i can create some technical demos first.

The reason i asked about the cjecks , was just, that i was wondering what was going on in the source code when i looked at it, because the lwjgl source has changed alot from previous versions. And i have never seen these checks before. And now i know that i can disable them someday if there is a need. If i need help with that later on i may come back to. Now i am going to stick with version 0.99 for now.

I tried to build the OSX version from the Trunk yesterday , because Spasi included the ATI_text_fragment_shader extension , but i had some problems building the version. The make file was somehow missing thats why the main build failed. Instead a build.xml is in the Mac folder ?!?. Ok it was really late yesterday, and i may have missed something, but can you confirm that the build works?

thanks
Mike

elias · May 16, 2006, 7:22pm

The build.xml replaces the Makefile. Typing ‘ant’ in the macosx directory should give you a liblwjgl.jnilib, providing you have gcc 4 or better installed.

elias