Catch 22 is not the place to discuss JOCL, due to that threads size /scope. I did not actually reply there with JOCL in mind. No great loss. Just some unrelated comments to start:
I have not observed nor tried to measure contention, done I assume by comparing throughput of multi threaded vs single. A single thread for all GPUs using synchronized blocking assumes homogeneous GPU’s & almost identical work. One person tried to do load balancing with a single thread and no blocking, but I think he gave up due to all the problems. My kernels do not do the exact same amount of work every time either, sort of a close range, so sync could hold one or the other up each iteration. More GPU’s make this worse. With OpenGL sync is of overriding importance, as displaying frames out of order, or irregularly is not likely acceptable.
I have never had fewer CPUs than GPUs, where that would really kick in though. From Netbeans profiling data of Nvidia’s implementation, it is clear their blocking is very CPU intense. By this I mean all the CPU time is used spinning, waiting for kernels to finish & enqueue or passing args is nothing. Getting the minimum wait time during a worksize calibration operation, which is wise to do, could be very useful. You could then put a Thread.sleep(85% min) after the last enqueue in the set & then block. I am doing this as part of a redesign, but have not fired up the full system to test.
You described low/gluegen built and high/OO level bindings before. Are you still doing / going to do the high level, cause that stuff is not built by computer? The high level bindings Olivier continues to improve, called JavaCL (low/JNAerator built is OpenCL4Java), & looks outstanding. My own OO level is not near as comprehensive, and am going to drop it when I get the chance. I only built my own, because there weren’t any when I started. That is where enhancements in API will not naturally be realized.
That is it for now. Good luck on your project! Maybe get a section for JOCL someday, but no hurry. Actually there are already a lot of OpenCL forums, probably too many right now.