Catch 22 for jogl

nsigma · March 7, 2010, 1:11pm

I agree with you that rewriting JOGL with JNA is probably a bad idea, but only because it’s seems like unnecessarily reinventing the wheel. In general, I think JNA and JNAerator are great projects. While there’s obviously some extra overhead compared to JNI, I’d say it’s pretty negligible when you factor in everything else your code is doing, and worth it for the ease of development and deployment. I wrote a wrapper for Jack Audio Connection Kit using JNA a while back, and even without switching to the higher performing direct mapping mode this is perfectly capable of doing realtime, low latency audio without any skipped frames.

I have to say I haven’t used JNA on Windows yet, but I’m under the impression that it uses the exact same library (libffi) to do its stuff, so don’t know why the performance should be that much different. i may be wrong but I think JNA is an equivalent of Platform Invoke, not that it uses the same mechanism.

And I doubt you’ll convince the author of JavaCL to switch over - he’s got rather a lot of time invested in JNA!

Neil

AI_Guy · March 7, 2010, 7:04pm

I was just putting the option out. I have no direct interest in OpenGL, and no axe to grind. It would entail some re-write, but Gluegen is aging / self maintained and a resource guzzler by comparison.

OpenCL is a different story. I have a huge interest. What happened in the past with OpenGL may not repeat itself with OpenCL. The primary reason is OpenGL is just an API (ignoring GLSL) and OpenCL is a full C99 language. There are API calls to manage it (compiling, exec, read/write) from what is called the host language, but the big action is what is running on the GPU. They are called kernels. This is attracting a constituency from everywhere, like Research-Engineering-Finance. Being open has probably caused it to already be bigger than CUDA & CAL. It seems very little is right now involved with vision.

Do not get me wrong, I am very impressed with Michael, & JOCL as a university thesis. JOCL could put him in a good position for the future. OpenCL might change rapidly though, and I am not sure competing for resources with JOGL for updates is a arrangement that is best for me.

I would also point out that the Particles Demo from the above site hands off to JOGL too, when it is done. Performance seemed impressive to me anyway. At my max resolution, 2560 x 1600, & old 8800 GTX, it does not really slow down that much with 1M particles. It also seemed mildly entertaining. It makes me wonder if something similar to an OpenGL pipeline, or Java3D might be implementable as a OpenCL kernel pipeline. Not suggesting this any direct path forward, but it might be a good proof of concept for some else’s studies.

bienator · March 7, 2010, 8:40pm

[quote=“AI_Guy,post:182,topic:34594”]
maintaining “my” JOCL binding (my, since there is another one on jocl.org) is actually trivial. The work is basically done and everything is automated. Headers are automatically downloaded, preprocessed, parsed and the binding is generated etc. I refactored the gluegen generator, fixed the pseudo preprocessor and cleaned up several places, improved its performance, added unit tests, made it ANTLR 3 compatible, macro support, jdk5 features etc. There are a few places which could be made more DRY and a few ideas from Ken (one of the original authors of gluegen) but this is not really priority for me for now. (compiler setup is currently still a PITA)

As a result the ANT script of JOCL is around 250 lines long (the autogenerated netbeans stuff not counted).

It is IMO highly improbable that Khronos would (or could) introduce API changes which would break the build in a way which couldn’t be fixed in a reasonable timeframe. In worst case its always possible to implement the function by hand (JOCL has around 150 lines of handwritten c code partly to simplify a few more complex opencl calls or do multiple api calls with one method call)

well you can end up doing a lot of api calls in OpenCL too. I have some tests using one Java-Thread per CLCommandQueue per GPU (to test the MultiQueueBarrier). Do this on the server and you produce more contention as in a fixed-function OpenGL 1.1 game.

btw my thesis isn’t about “how to write a CL binding using gluegen” its rather “how to have fun with CL”

gouessej · March 8, 2010, 9:57am

nsigma:

I agree with you that rewriting JOGL with JNA is probably a bad idea, but only because it’s seems like unnecessarily reinventing the wheel. In general, I think JNA and JNAerator are great projects. While there’s obviously some extra overhead compared to JNI, I’d say it’s pretty negligible when you factor in everything else your code is doing, and worth it for the ease of development and deployment. I wrote a wrapper for Jack Audio Connection Kit using JNA a while back, and even without switching to the higher performing direct mapping mode this is perfectly capable of doing realtime, low latency audio without any skipped frames.

I have to say I haven’t used JNA on Windows yet, but I’m under the impression that it uses the exact same library (libffi) to do its stuff, so don’t know why the performance should be that much different. i may be wrong but I think JNA is an equivalent of Platform Invoke, not that it uses the same mechanism.

And I doubt you’ll convince the author of JavaCL to switch over - he’s got rather a lot of time invested in JNA!

Neil

I understand your position but the difference of performance is far from pretty negligible on my view as I use OpenGL on very poor machines. For example, if you use the C# OpenGL binding of OpenTK that precisely relies on Microsoft Platform Invoke, you will notice that it is at least 70% slower than JOGL, this is absolutely not acceptable. You can read this about JNA :

[quote]While some attention is paid to performance, correctness and ease of use take priority
[/quote]
I have tried to check if libffi uses Platform Invoke, I have no answer yet.

I understand the author of JavaCL has invested a lot of time in JNA but I don’t see any interest of maintaining 3 very similar OpenCL bindings (OpenCL4Java, JavaCL and JOCL).

Go on Michael

nsigma · March 8, 2010, 12:55pm

No offence, but I think you’re comparing apples and oranges - C# and Platform Invoke is not Java and JNA! And even then do you mean that the whole application performs 70% slower or just the call across the VM<>Native divide? If the latter, then I’d still say that this is usually going to be negligible and swamped by other factors.

Obviously, it’s not possible to compare OpenGL experiences, but my experience of JNA with JNAJack and Gstreamer-Java has also been with low powered machines (a 6 year old Pentium M and a ULV 800MHz Celeron), and despite neither binding yet using JNA’s direct mapping, both have solid performance.

To quote another section of the JNA site -

[quote]JNA direct mapping can provide performance near that of custom JNI
[/quote]
Personally, I don’t think that either of our quotes truly reflect the reality of using JNA, but I would stand by my assertion that in the vast majority of cases where you need to call native code, JNA offers huge development and deployment benefits for a very minor performance hit. You definitely won’t get a 70% slowdown on your whole application using it, though I’m not sure if that’s just me misreading what you wrote?

[quote]I understand the author of JavaCL has invested a lot of time in JNA but I don’t see any interest of maintaining 3 very similar OpenCL bindings (OpenCL4Java, JavaCL and JOCL).
[/quote]
Well, 2! OpenCL4Java and JavaCL seem to be two sides of the same coin.

Best wishes, Neil

gouessej · March 8, 2010, 1:53pm

It is not neglictable in projects like Jake2 as the both guys succeeded in going faster than the C/C++ version of Quake 2. The problem is that I found some years ago a document explaining how JNA works under Windows, it did not mention libffi but it mentioned explicitly the use of Microsoft P/Invoke. I have looked for this document for hours without success. Therefore, I thought I was not comparing apples and oranges. I will look for more information…

However, I have just found something very interesting :

[quote]If you need a lot of memory copying. For example, you call one method which returns you a large byte buffer, you change something in it, then you need to call another method which uses this byte buffer. This would require you to copy this buffer from c to java, then copy it back from java to c. In this case jni will win in performance because you can keep and modify this buffer in c, without copying.
[/quote]
This is very important for an OpenGL binding.

https://jna.dev.java.net/#performance

[quote]How does JNA performance compare to custom JNI?

JNA direct mapping can provide performance near that of custom JNI. Nearly all the type mapping features of interface mapping are available, although automatic type conversion will likely incur some overhead.

The calling overhead for a single native call using JNA interface mapping can be an order of magnitude (~10X) greater time than equivalent custom JNI (whether it actually does in the context of your application is a different question). In raw terms, the calling overhead is on the order of hundreds of microseconds instead of tens of microseconds. Note that that’s the call overhead, not the total call time. This magnitude is typical of the difference between systems using dynamically-maintained type information and systems where type information is statically compiled. JNI hard-codes type information in the method invocation, where JNA interface mapping dynamically determines type information at runtime.

You might expect a speedup of about an order of magnitude moving to JNA direct mapping, and a factor of two or three moving from there to custom JNI. The actual difference will vary depending on usage and function signatures. As with any optimization process, you should determine first where you need a speed increase, and then see how much difference there is by performing targeted optimizations. The ease of programming everything in Java usually outweighs small performance gains when using custom JNI.
[/quote]

Solid performance? You’re not precise enough on my view. I’m sure it is enough for many applications but not for OpenGL (as I explained above) and it might become a problem for OpenCL.

It was only a rumor some months ago… Ok but I don’t yet see the interest of having 2 separate very similar OpenCL binding for Java (JavaCL and JOCL).

princec · March 8, 2010, 1:58pm

The buffer copying problem you allude to here was solved with DirectByteBuffers - this is why they are exclusively used instead of arrays or non-direct ByteBuffers in LWJGL. Does JNA not allow for this?

Cas

gouessej · March 8, 2010, 2:41pm

JOGL doesn’t accept indirect NIO buffers since 2007. JNA allows types derived from Buffer as Structure fields since June 2009.

princec · March 8, 2010, 3:02pm

So… there wouldn’t necessarily be any overhead in sending/receiving data over JNA then, if it uses reused direct ByteBuffers?

Cas

nsigma · March 8, 2010, 3:31pm

Exactly! And it’s been able to do this since before June 2009 - see for example https://jna.dev.java.net/javadoc/com/sun/jna/Pointer.html, which you can view as a direct bytebuffer or read and write from directly.

The quote from Stack Overflow is just plain wrong!

[quote]The calling overhead for a single native call using JNA interface mapping can be an order of magnitude (~10X) greater time than equivalent custom JNI
[/quote]
Why this in bold? This is talking about interface mapping as opposed to direct mapping. See the following paragraph.

[quote]Solid performance? You’re not precise enough on my view.
[/quote]
I agree, but I don’t exactly have anything to benchmark it against! ;D I’m not aware of another binding for GStreamer. There is a binding for Jack using JNI but the API is quite different to mine, as JNAJack tries to keep as true to the Jack API as possible it has to make more calls across the java<>native boundary per buffer. Even using the interface mapping you quoted above, rather than optimised to direct mapping, it’s measuring a 1-2% extra CPU load (on that Celeron 800Mhz!). I would suggest that low latency audio offers similar performance concerns to OpenGL/CL.

It would be interesting to see some comparisons of performance between JavaCL and JOCL.

Neil

gouessej · March 8, 2010, 4:22pm

You are comparing apples and carots.

It will only show that GlueGen gives better performance than JNA.

bienator · March 8, 2010, 4:29pm

well no its not. Interface mapping is far worse performing as JNA’s direct mapping. Direct mapping for functions without params is almost exactly 10x slower as JNI and even worse if you add params. The average OpenCL function has via JNI around 50x less call overhead than direct mapped JNA. JavaCL has some optimizations in it but you simple can’t match the performance of an compile-time JNI binding.

+1 (in general, not necessarily related to the JNA vs JNI topic)

princec · March 8, 2010, 4:45pm

Anyway, who cares if there’s 2 bindings to it? Competition is good

BTW, my current game is drawing about 4000 sprites per frame on average, which it does by issuing approximately 1,000 JNI calls or so. If, as I suspect, the overhad of JNI is about 100-300 nanoseconds depending on CPU, that’d be 100-300 microseconds of wasted CPU time for the entire scene. With JNA, if bienator is right, we’d be looking at 1-3 milliseconds, which is slightly more significant given that I’ve only got 17 milliseconds to play with in total.

Cas

nsigma · March 8, 2010, 4:51pm

Probably a fair cop. Hoist by my own petard! ;D

The JNA site talks of a 2x-3x overhead for direct mapping. I’m not disputing your figures, but interested to know where they come from.

bienator · March 8, 2010, 4:52pm

wrong, there are 3 bindings but +1 for the competition part, having the freedom to choose a good thing

gouessej · March 9, 2010, 2:51pm

Competition? I mainly see a waste of time in developing extremely similar APIs whereas it would be more efficient to concentrate development and maintenance efforts on a single one in my humble opinion.

AI_Guy · March 9, 2010, 4:25pm

Being in development of a commercial product using OpenCL, this about as far away from a waste of time as I can think of. Guess it boils down to your perspective.

I would point out from an OpenCL perspective, OpenCL IS NOT an API to be interfaced. It is “C for the GPU”. Any host language is just OpenCL’s bitch. C is far an away the bitch of choice. It has a natural advantage which will not be overcome easily. Even C++, whose bindings are going to be officially part of the next release, is destined for the land of misfit hosts with Java, C#, Delphi, Python, and hell maybe even Cobol

What is interesting is in that just 6 mos of release, there is already a lunatic fringe out there doing experiments writing hybrid implementations of languages. By that I mean that while technically there is a host language & OpenCL, the programmer cannot really tell. There is a Scala implementation written on top of OpenCL4Java (sorry about that lunatic thing Olivier), as just one example. That is why I encouraged experimenting with some OpenGL like thing, actually in OpenCL.

Any OpenCL system written with a monster # of host calls is probably in trouble anyway. That’s why that little contest is not about OpenCL. Just using it. Any visibility that gives to Java as a viable conventional host language is just icing on the cake.

Concentrating on OpenGL, while just keep an eye on OpenCL is probably the best option.

aNt · March 30, 2010, 8:51am

so we cleared up the jogl-lwjgl love then?
oh and opengl4 is out, could be open_es will be history soon.

princec · March 30, 2010, 10:45am

ES has a lot of legs in it yet…

Cas

aNt · March 30, 2010, 6:25pm

legs are good ;D