raytracing improvement with LWJGL?

markuskidd · May 30, 2003, 12:28am

http://remon.mojomedia.at/
http://remon.mojomedia.at/projects/raytracer.html

could this be done better with LWJGL? since a lot of raytracing seems to be CPU rather than GPU intensive, would there be an improvement? the author mentions

markuskidd · May 30, 2003, 12:33am

followup:

author says that his main frustrations are "lack of pointers, very expensive casts and slow math functions… " maybe this shouldn’t have been posted here, but his demo attracted some attention and it seemed a shame to have to read the overtones of dissatisfaction with java performance in the writeup :-/

William · May 30, 2003, 5:56am

OpenGL and hardware acceleration only supports local illumination (basicly raytracing where each ray stops when it first hit something, instead of getting reflected further) so it can’t really be used in his application. He mentions algorithmic differences between the applications and those could easily account for a 1:5 performance difference.

It’s really odd that he complains about casts, seems like he is using an old VM and even then he should be able to get rid of most casts by writing his own primitive collection classes. I’ve never had casts impact performance with any of the 1.4 VMs. Maybe he is really confusing a cast with wrapping/unwrapping primitive types into objects when using the standard collections.

Btw, I don’t doubt that the math functions are slow. Here is a benchmark that mentions performance problems with the math libraries of recent JDKs:
http://www.coyotegulch.com/reviews/almabench.html

I also vaguely remember someone from Sun coming here and asking whether we were affected by slowness in the current sin and cos implementations and mentioning that there may be a faster way to do them.

thughes · May 30, 2003, 9:05am

[quote]It’s really odd that he complains about casts, seems like he is using an old VM and even then he should be able to get rid of most casts by writing his own primitive collection classes. I’ve never had casts impact performance with any of the 1.4 VMs. Maybe he is really confusing a cast with wrapping/unwrapping primitive types into objects when using the standard collections.
[/quote]
BTW I’ve seen mulitple implementations of primitive collections, you may find using them somewhat quicker. You can find one such example here http://pcj.sourceforge.net/

bloid · May 30, 2003, 11:14am

Hiya

Just to let you know that this is being discussed over at cfxweb too:

http://www.cfxweb.net/modules.php?name=Forums&file=viewtopic&t=731

I think the casting problem comes when trying to get the image (constructed out of floats) back into the BufferedImage…

JDK 1.4 did introduce DataBufferFloat, but unfortunately, FloatDoubleColorModel has remained in the JAI library

I think that’s the problem anyway… Hehe, I guess Remon will pop here when he sees the link at CFXWeb, and explain better than I can I have to say, I am impressed with the speed he is getting at the moment… Can’t wait to see the new version of his applet with the interpolation switched on

bloid

[edit] Also, a float version of the Math library would be cool

bloid · May 30, 2003, 1:11pm

Ok, remon posted a response on the CFXWeb forum… I’ll copy it here for people to see

-----COPIED TEXT BEGINS-----
…
Now, i’ll reply here to the discussion on that board because i dont feel registering for yet another one :

[quote]It’s really odd that he complains about casts, seems like he is using an old VM and even then he should be able to get rid of most casts by writing his own primitive collection classes. I’ve never had casts impact performance with any of the 1.4 VMs. Maybe he is really confusing a cast with wrapping/unwrapping primitive types into objects when using the standard collections.
[/quote]
I’m complaining about float <-> int casts mainly, the thing is running in 1.4.1. And obviously, the other BIG benefit a C(++) based engine has is pointers, i cant do certain tricks that could speed up my inner interpolation render loop by at least 100%

[quote]I also vaguely remember someone from Sun coming here and asking whether we were affected by slowness in the current sin and cos implementations and mentioning that there may be a faster way to do them.
[/quote]
The trig methods arent really used during raytracing so not really an issue for me personally. I use vector dotproducts to get cos.N, no need for trig methods. The main one i have a serious problem with is Math.sqrt(d), which, as you can imagine, is called a lot, since it’s used to get vector lengths amongst other things, and i cant always leave the length squared…

[quote]He mentions algorithmic differences between the applications and those could easily account for a 1:5 performance difference.
[/quote]
I’m afraid this isnt the case, algorithmically, the following things are different between RealStorm and my engine at this point :

RealStorm has spatial subdivision
RealStorm uses spatial subdivision (octrees i think, possibly KD trees) for large scenes, doesnt have any impact on small scenes since our test scene doesnt qualify for a single spatial subdivision. So effect on performance in our test scene is 0%
RealStorm has viewport scene subdivision
This is a big optimization, but again, our test scene will only benefit from this slightly, because objects are reasonably big compared to the total viewport (walls, cylinders). I’d say that once i implemented this the test scene’s FPS will increase about 50-100%
RealStorm has a different lighting model
This is a large difference, however, both systems use lookup tables during tracing, so performance wise this is no difference. It’s just the way the models are calculated that is different. 0% improvement
RealStorm has a different reflection mechanism
Again, different, but not slower. 0% improvement
RealStorm has frustrum culling
All objects in the scene are visible, so this would actually only add overhead, 0% improvement
RealStorm has primary ray optimization
This technique precalcs stuff that stay constant throughout a frame for entire objects, a big optimization i havent done yet. 30-80% increase i think.

So, basically, i think i can get the engine in total about 100% faster than it is now at a reasonably steady framerate. But the 1:5 factor will still be in affect, i just got a screenshot from RS running the same scene at the same resolution at 50FPS on an AMD 3000XP, if i can get within 10 on my AMD 1800XP i’ll be happy, maybe 15FPS on the 3000XP is possible, we’ll see. Lack of pointers, slow array access, slow field access, slow primitive type casts…it’s all not helping

Someone can post this on the other board if needed…

Remon

-----COPIED TEXT ENDS-----

Full thing is here: http://www.cfxweb.net/modules.php?name=Forums&file=viewtopic&p=5769#5769

jbanes · May 30, 2003, 2:06pm

Heh. And right after the thread on doing real-time RayTracing in the GPU. It sounds like the biggest problem here is that he’s working from C/C++ assumptions and concepts instead of figuring out how to get Java’s OO model to help instead of hinder. Not that I’m saying this is an easy task, but he should be able to get what he needs that way. Also, there are a ton of optimizations that work just as effectively in Java as they do in C. Lookup tables for Sin, Cos, and Tan for example. You’d probably introduce slight errors if the tables aren’t fine enough, but for a small trade off in memory, you should be able to develop perfectly acceptable tables. Anyone know if he’s willing to release his source?

swpalmer · May 30, 2003, 3:21pm

The big question I have is has he actually profiled the code?

I don’t see how having pointers are a BIG help… at best you save a single add operation for every access by not having to offset from the array base every time.
The only thing you can do with pointers other than access data through them is pointer arithmetic… which you should be able to translate fairly directly to array index calculations… then tag on one add instruction to actually get at the data.
Shoot - you could implement a heap as one large array… then the array index IS a pointer. I’m sure there is something I’m overlooking… but I’ve never found a need for traditional pointers in any of my Java work.

float<->int casts… I wonder if this is better in the 1.4.2 server VM? My simple tests using floats to convert RGB to YUV ran faster than code produced by MSVC++… The MS compiler is was actually quite bad in that area, although I think recently they’ve improved. (It was something to do with setting the rounding mode EVERY time they converted float to int as I recall.)

markuskidd · May 30, 2003, 8:12pm

the inner loop in question has been posted to the CFXweb thread linked above, fyi

princec · May 30, 2003, 8:44pm

Well, let’s face it, this is Sun’s fault for not getting that 2-stage JIT compiler working yet. There’s still so much to lose and so little to gain by separating client and server Hotspots. This being another small battle lost.

Cas