LWJGL/JavaFX Integration

Links:

Minimum requirements: Java 7 with JavaFX 2.2+, OpenGL 1.5+ with support for pixel buffer objects and framebuffer objects.

edit: It currently does not work on MacOS X.

Screenshot:

What’s going on in the demo?

  1. An LWJGL 3D scene is rendered to an offscreen framebuffer and then displayed inside a JavaFX node. The integration is lightweight and you can have any other JavaFX node on top of the 3D scene.

  2. A JavaFX node is rendered to an offscreen image, copied to a GL texture and then displayed inside the 3D scene. In this demo, the node happens to be a WebView that renders java-gaming.org. You can interact with the WebView and the texture will update accordingly.

Please report any problems (won’t start, crashes, bad performance) and feel free to check out the source code. I’ve tried to make it general enough so that it can be used in windowing toolkits other than JavaFX (e.g. with AWT).

If you try the windows installer and run into problems, try running the executable from the command-line with /Debug.

It runs (windows 8 on samsung series 5 laptop), but it took a while to start up and also only runs at 20-30 FPS. Is this on purpose?

Performance depends on how efficient the GPU and driver are at asynchronous PBO transfers. There are also several ways to implement the transfers and each GPU vendor favors a different one. The current implementation has only been optimized for AMD GPUs, so I’m actually hoping that someone else will contribute an optimized path for NV and/or Intel.

What’s the GPU in that laptop?

I believe it’s an integrated intel card.

something seems borked on the host of the applet/jws page, can’t download the jnlp or run applet.

It did run earlier though when I testing on an old computer, however it failed to run there as the PBO support was broken (though the driver claimed it was supported). Just an old ATI card with old broken drivers.

nice, but really slow here. I get only 9fps on my laptop with fedora running. Have a Nvidia 550 but think it is deactivated and uses probably the intel one

Hi

Nice job but do you think we could find a solution to do it without PBO? I already investigated a lot but I found no safe and satisfying solution on the long term.

For JavaFX in particular, I would wait for it to be open-sourced first. Last I heard the target for the OS release is February. Then we can properly explore how we could hack into their GL pipeline, or even DX one with WGL_NV_DX_interop.

This solution is meant to be a lightweight alternative, if everything else fails. Assuming it can run fast enough of course. I’ll test on Intel later today.

just read about a quite handy GL extension which could speed up things on intel cards quite dramatically:
INTEL_map_texture

Spasi, please have my babies. This is awesome!

Runs great at 235 FPS with triple buffering and 32x MSAA :slight_smile:

Also the installer works beautifully! I was so surprised by how seamless and easy it was to install and uninstall.

This is all on Windows 8 x64 with a GTX 580 :slight_smile:

Update:

  • Organized the source so that it’s clear what is library and what is demo code.
  • Added choice boxes with alternative implementations.
  • Replaced CountDownLatches with Semaphores (reduces garbage).
  • Misc optimizations. The default implementation now runs at ~160 fps on a desktop Sandy Bridge IGP, ~1550 fps on my Radeon.
  • Added an implementation that utilizes the INTEL_map_texture extension (thanks Danny02!). Performance goes up to ~275 fps on my i5.
  • Added untested implementation for Nvidia GPUs. It does ReadPixels on a device allocated buffer then copies its contents to a host allocated buffer using the ARB_copy_buffer extension. This is supposed to double the readback bandwidth on consumer NV cards.

Details:

Intel. Well, the drivers are a pain. There’s no way to perform an asynchronous read-back to a PBO, everything blocks. INTEL_map_texture allows for CPU-cached textures with linear layout, but performance is horrible if they’re accessed by the GPU. After a lot of pain, I ended up using an extra buffer for GPU access and BlitFramebuffer for asynchronously copying from the tiled texture to the linear one. Anything else and it would block.

NV. I could only test on an old 7900 GS and performance is horrible, slower than the i5 IGP. The ARB_copy_buffer implementation is slower than the default too. I would really appreciate it if someone could test and report numbers on a more recent model.

Java 8. The setPixels/writePixels calls have worse performance (around 50%) on the latest JavaFX builds, so you should test on Java 7. I also had to update the installer above, but it’s buggy on 7, you’ll have to edit the shortcut and add a /Debug argument.

Performance. Well, it’s almost usable. It’s a horrible horrible hack, but with buffering and multi-threading (LWJGL render thread, JFX application thread, JFX render thread) there’s enough overlap to hide the readback/upload overhead and have something playable. Keep in mind though that without Display.sync throttling you can swamp the JFX threads with the image update calls and cause UI freezes. On my setup it happens if I maximize the window to 1080p and disable syncing.

Thanks. This is what we have been looking for all along. JavaFX is a great GUI tool. Will be trying this integration very soon and migrate some of our existing programs to it.

I can test on my Quadro NVS 285 but I’m not sure it will be faster.

Can confirm the bugginess on Java 7 in the new build. I couldn’t run it directly anymore, I had to use /Debug.

However with no sync it still runs at ~230 FPS

AWESOME!

(Unfortunately useless to me because I don’t know how to use/don’t use JavaFX)

[quote=“gouessej,post:13,topic:40152”]
I wouldn’t be surprised if it is. One of the advantages of Quadro over the consumer line is greatly improved readback performance.

[quote=“ra4king,post:14,topic:40152”]
Could you please test again with no multisampling? I’m interested in the raw readback/upload performance without anything heavy skewing the fps. Also, make sure you don’t resize the window or the panels, for better comparison with my numbers. Does the ARB_copy_buffer implementation perform better or worse?

I just tested with my nvidia GPU ( GTX 560M ), running in the applet with a screen res of 1920 * 1080 and no antialiasing.
I have 270 FPS with asynchronous PBO and 450 FPS with ARB_copy_buffer … so it definitely performs better on my PC with ARB_copy_buffer :slight_smile:

I have an old notebook with dual GeForce 9400s. Window’s launcher, default size = ~60Hz. 8 tap AA = ~30-40Hz. (vsync enabled on both…I guess I should have disabled…bad me. Original drop)

Framerate is (obviously) heavily impacted by the number of buffers, as the uploads are async. I see linear growth in fps with increasing the number of buffers.

I tested again with the windows installer, keeping the default window size:
I got 450 FPS with asynchronous PBO and 710 FPS with ARB_copy_buffer