The primary reasons for doing this was either having a very weak GPU where culling a single triangle no matter the CPU cost was a win (Playstation 3) or attempting to avoid a number of expensive single-threaded draw calls with threadable occlusion testing while avoiding expensive synchronization by reading back the depth buffer. Hopefully with Vulkan-based engines this won’t actually be needed any more, as draw calls can be threaded and there’s no high driver overhead of doing GPU occlusion culling.
This is a very specific case. I meant that writing an entire API like OpenGL purely in Java has no real world applications.