Some people have tested the real time ray tracer i am making on different computers with different number of cpus and cores per cpu and it is apparent that the more cpus/cores a computer has, the more dramatic the level of cycles are wasted. (on a 8 core machine an average of 50% utilisation )
I am quite inexperienced with multi-cpu/core programming and am wondering whether there is someing fundamental I am doing incorrectly which would account for this waste and whether there are steps i can to reduce it?
At a high level the ray tracer logic flows in the following steps:
-
obtain the number of cpus/cores
-
create a render thread for each core
-
nominate one thread to the be the master thread
-
split the screen into tiles and give each thread an equal portion of the tiles
-
All render threads apart from the master thread wait by call await() on the “state” CyclicBarrier
-
master thread updates world state and calls await() on the “state” CyclicBarrier awaking all threads.
-
all threads render their tiles and then wait by call await() on the “render” CyclicBarrier.
-
once all threads have called await() on the “render” CyclicBarrier all threads wake and the non-master threads goto step 5.
-
the master thread displays the rendered image and then goes to step 6.’
An applet of a recent version of the raytracer can be found here:
http://javaunlimited.net/hosted/moogie/testApplet.html
The above version has some banding on the specular reflection which is fixed in a later version. To see fps an executable JAR version can be found here:
http://javaunlimited.net/hosted/moogie/jrtrt_specular_fixed.jar