I was promoting NIO to someone who thought it not worth it recently, and got into a little difficulty. I left the conversation with a few doubts, mainly because of the following points he made:
-
Why isn’t NIO properly documented yet? Lots of critically important parts of the API (e.g. How do you detect a disconnect at each and every stage) are not covered in the API docs…you have to read FAQ’s and tutorials to find them out instead. (note: this was true of the 1.4.0 release, but perhaps 1.4.1 or .2 has corrected this?)
-
Can we trust something that contained several show-stoppers in the first post-beta release? Is this actually being tested properly by Sun? Several of the bugs are basic problems that should have been uncovered by unit-testing, which sets a worrying precedent that they weren’t found. (I’m not intimately familiar with the bugs, but I remember spotting some stuff that was fixed in 1.4.1 that surprised me; IIRC there was one bug that the NIO API wasn’t actually implemented properly on Windows, it used completion ports, and was constrained to several low fundamental limits inherited from that. Also various problems of bits of network NIO just not working at all on some platforms, IIRC?)
-
How should one use network NIO for high performance servers? There are seem to be few patterns and documents anywhere on the web (I found one eventually, although it’s part of a research project at Stanford, and somewhat off the beaten track) describing how to use NIO in a typical tens-of-thousands-of-clients-per-server system - i.e. with at least 20 threads, and all the associated problems. The API’s do some pretty mean and nasty things when you start multi-threading (lots of obvious ways of using the API break because of e.g. places where threads are mutex’ed in surprising ways). I have in the past had problems with the fact that SelectableChannel.register() blocks if the Selector you’re registering with is actually in use; this makes it impossible to use .register() in the obvious way, in an MT environment - instead you have to write rather atrocious logic (from an OO point of view) in your select() loop which, immediately after a select() unblocks, registers a queue of awaiting requests. This is a stupid API design, and makes every user jump through the same hoop in every app that uses MT’d NIO networking.
-
How do you have 40 threads sharing the load of answering requests, working off five Selectors, and using several native IO subsystem pre-allocated ByteBuffers? (my suggested answer to this is that if you create a mem-mapped/direct buffer, then make views on it, one for each thread, hopefully this gives the same effective performance as if you had just one thread and one mem-mapped buffer (no views).The API is not entirely clear on this, saying only that mem-mapped buffers “behave no differently to direct” BBs, and that direct BB views are also direct; it doesn’t make explicit what the memory requirements are for making views on direct BBs (saying only that you, as the programmer, won’t be able to measure it because they are outside the java IO system). Who really knows what the overhead is per view?
…these were his major points, and I could only suggest ways in which they MIGHT be solved, but we couldn’t find details in the API docs to answer all of these questions. Does anyone know the answers to the above questions? Do you think it’s worth worrying about these problems (they mostly seem to be issues of poor documentation, which you eventually learn yourself the hard way anyway, but the possible lack of testing by Sun had me worried)?