Java2D Slow with Integrated chips (windows)

I think that java2D gets really fill rate limited when using the d3d pipeline on integrated chips.

For example: my laptop has optimus and never seems to use its dedicated gpu. When I run my particle editor, it can render 1k+ particles fine as long as they are small sizes 1-5 pixels. The moment you try rendering an image above 300 it drops to a screeching halt. When using the opengl flag, it performs how it should and does not seem to be fill rate limited. I do not know what java uses on macs but I think there might also be an issue there.

If others out there could test this maybe that would be sweet. By test I mean try rendering multiple images with and without the opengl global flag set and see if there is a difference in performance.

Please don’t just screem out, “Java2D sucks buttox! Use a REAL graphics library like idk opengl”

Here’s my own benchmark.

iMac 10.6.6 with an Intel Core 2 Duo (3.06 GHz), OpenGL 2.1 and NVIDIA GeForce 9400
Tests run in 800x600 windowed mode

Java2D @ 60 FPS
32x32 RBA ~ 1,300 sprites
32x32 RGBA scaled to 256x256 ~ 38 sprites (uses getScaledInstance and SCALE_FAST)
256x256 RGB ~ 135 sprites
512x512 RGBA ~ 4 sprites

Java2D @ 60 FPS with “-Dsun.java2d.opengl=true”

  • No change

For comparison, Slick2D @ 60 FPS
32x32 RBA ~ 18,500 sprites
32x32 RGBA scaled to 256x256 ~ 400 (uses getScaledCopy and FILTER_NEAREST)
256x256 RGB ~ 300
512x512 RGBA ~ 53

The Slick test was arguably easier to write, and certainly more compact. Will provide source shortly…

Java2D reliability and performance sucks, plain and simple.

EDIT: Source. Feel free to nitpick it and optimize it.
Java2D
Slick2D

Java2D is ok to use, that flag obviously not: on my computer it results in empty black windows when enabled, no matter which Java application.
So be careful when shipping anything with the flag enabled.

My work computer (using davedes’s java 2d code) gives:

Java2D @ 60 FPS
32x32 RGB ~17000 sprites
256x256 RGB ~865 sprites (uses getScaledInstance and SCALE_FAST)

32x32 RGBA ~15500 sprites
256x256 RGBA ~1600 sprites (uses getScaledInstance and SCALE_FAST)

The graphics card is not itegrated, however it is not a fast 3D accelerator card either… so take these figures how you will :stuck_out_tongue:


System Information

Time of this report: 7/19/2012, 16:07:57
Machine name: XP060705
Operating System: Windows XP Professional (5.1, Build 2600) Service Pack 3 (2600.xpsp_sp3_gdr.120504-1619)
Language: English (Regional Setting: English)
System Manufacturer: Dell Inc.
System Model: Precision WorkStation T3500
BIOS: Phoenix ROM BIOS PLUS Version 1.10 A08
Processor: Intel® Xeon® CPU W3530 @ 2.80GHz (8 CPUs)
Memory: 3582MB RAM
Page File: 2211MB used, 3249MB available
Windows Dir: C:\WINDOWS
DirectX Version: DirectX 9.0c (4.09.0000.0904)
DX Setup Parameters: Not found
DxDiag Version: 5.03.2600.5512 32bit Unicode


Display Devices

    Card name: NVIDIA Quadro FX 1800
 Manufacturer: NVIDIA
    Chip type: Quadro FX 1800
     DAC type: Integrated RAMDAC
   Device Key: Enum\PCI\VEN_10DE&DEV_0638&SUBSYS_062C10DE&REV_A1

Display Memory: 768.0 MB
Current Mode: 1280 x 1024 (32 bit) (60Hz)

And yet it must be said that this is precisely why LWJGL was developed in the first place…

Cas :slight_smile:

Ehh I guess java2D just sucks. :stuck_out_tongue:

Do you mean that there are 17000 non scaled sprites and 865 scaled sprites at once? or one test with 17000 and one with 865?

I still think jav2d is not crap. It is good for small games/applications as long as you keep things simple. But yes, lwjgl is the way to go to get good performance.

So i played with your test code a bit and was very surprised. Most of what you did was the fastest way of doing things and there was nothing to complain about but the interesting thing i found out was how fillrate limited java2D really is.

For example: 256256 30fps 2k sprites. Drop it to 128128 200+ fps. Also, scaling instead of before hand but when the balls were called to render, did not change the performance. Everyone says scaling like that is big fps drop. I see no difference what so ever.

g.drawImage(sprite, (int)balls[i].x, (int)balls[i].y,128,128,null); //<---should be slower

then

sprite = sprite.getScaledInstance(128, 128, Image.SCALE_FAST); //<--- doing it before hand

Another thing I found odd any maybe someone with more experience can explain it to me but why does using transparency like this


Graphics2D g2d = (Graphics2D) g.create();
g2d.setComposite(AlphaComposite.getInstance(AlphaComposite.SRC_OVER, .1f));

Make things fast and not slower? Is it because the lower the transparency the less pixels have to be drawn? The lower I make it the faster things run. At .1 I get 2k 256*256 at 90 fps. I am really lost now as I thought alpha was a killer in performance.

when you start having to rotate things in java2d you’ll definitely want to find something else.

256x256 x 1600 = 104.8576 million pixels.
At 60 FPS: 6.291456 gigapixels per second.

My GTX 460M has a THEORETICAL fillrate of 5.4 Gpixel/s.

I have a fully GPU accelerated particle engine which renders particles as transparent untextured square points. The particles bounce on the screen edges (it’s in 2D), but they are treated as 1 pixel points, meaning that they’ll only bounce when half the particle is already outside the screen. This “increases” fillrate since that part is clipped, and is especially noticeable with larger particles. That combined with the increased number of vertices to process reduces performance as the particles become smaller.

Gpixel/s = numParticles * particleWidth * particleHeight * 60 / (10^9)

64x64 particles: 20 000 particles = 4.9152 Gpixel/s.
32x32 particles: 70 000 particles = 4.3008 Gpixel/s
16x16 particles: 230 000 particles = 3.5328 Gpixel/s.
8x8 particles: 750 000 particles = 2,8800 Gpixel/s.
4x4 particles: 1 750 000 particles = 1,6800 Gpixel/s.
2x2 particles: 2 400 000 particles = 0.5760 Gpixel/s.
1x1 particles: 2 750 000 particles = 0,1650 Gpixel/s.

If the above (256x256) result is with Java2D I’ll be surprised. Either:

  1. a big number of the particles are outside the screen and get clipped,
  2. you have a pretty powerful graphics card and Java2D has perfect OpenGL acceleration

It’s definitely not impossible though. With only 1600 particles the CPU cost of even glBegin/glEnd might be low enough to allow 60 FPS, so Java2D might be able to handle it even if it’s not very CPU effective. The hardware can handle it, just look at a GTX 680:

http://www.xbitlabs.com/images/graphics/nvidia-geforce-gtx-680/31_gtx68_gpu-z.png

32.2 Gpixel/s… drool

@StumpyStrust
Scaling images (textures) is hardware accelerated. Up-scaling it even with bilinear filtering should be free and faster than upscaling it and storing it in memory, since your GPU needs to read less texture memory. Transparency is also often free on newer GPUs since the blending is also handled by dedicated hardware. I have no idea why it gets faster with lower transparency. Your GPU should do the exact same work no matter what alpha value you have.

[quote]Scaling images (textures) is hardware accelerated. Up-scaling it even with bilinear filtering should be free and faster than upscaling it and storing it in memory, since your GPU needs to read less texture memory. Transparency is also often free on newer GPUs since the blending is also handled by dedicated hardware. I have no idea why it gets faster with lower transparency. Your GPU should do the exact same work no matter what alpha value you have.
[/quote]
This is exactly what I thought but everyone always said different. >:(

Here is a video showing what I mean by faster more transparent. ???

(how do you embed videos?)

The more transparent you make the draw, the more pixels will be completely transparent at the edges, and they are therefore not processed at all. Seems to imply software rendering though.

Cas :slight_smile:

Wait, does SRC_OVER simply subtract (or add) that amount of alpha from/to each pixel? If so, it does imply software rendering like Cas said, since invisible and fully opaque pixels don’t require any blending.

The source is composited over the destination (Porter-Duff Source Over Destination rule).
Fs = 1 and Fd = (1-As), thus:

    Ar = As + Ad*(1-As)
    Cr = Cs + Cd*(1-As)

From java docs.

If it is just software rendering, isn’t it too fast?