Hi, I finally got around to play with some nio code again, but of course things had to stop working before I could get anywhere :-/ To make it easier to overlook i’ve put together some code containing of only the basic stuff, but where the problem also exists.
If I connect to my code using fx telnet everything works fine. If I then try to make a connection from macromedia flash and continue to start up the flash script (which means closing the connection and opening a new one fast) my selector suddenly starts to return immidiatly on select, returning 0 selectionkeys. Sometime this condition is even created the first time I connect from flash. My orriginal code connected to another server (and went bananas when the connections were dropped quick), but the behavior has been similar with this test code.
Once the selector has started returning immidiatly it will continue until a new connection is made, after when it will enter into normal blocking operation. Another thing worth noting is that if I dont write out anything to the socket then its not possibel to create this behavior.
Hope someone here can help me figure it out, cause Im pretty empty for ideas atm =) Ill post the code in seperate posts to make it easier to overlook.
Firstly, are you using 1.4.2? If not, no-one is likely to care. 1.4.0 and 1.4.1 do not work with NIO: they have too many major bugs, many with no workaround. They are also very platfrom-dependent, so only someone with your exact OS can help you.
Assuming you’re using 1.4.2 or above, I’m afraid that’s far too much code to wade through without any comments. The execution path is definitely non-obvious, and so to work out what you’re doing when is a difficult task for anyone other than you to do quickly. This is why I spent 30 seconds looking at your code and moved on (having learnt nothing at all in 30 seconds ), assuming someone else with more time would have a look for you.
Since no-one else has replied… If you can put together all the lines - in sequence - that handle or interact with your Selector, I’ll have another look. This should be about 20-40 lines of code. No methods, please, and comments every few lines to say what you’re about to do in the next X lines (e.g. every 2-5 lines is usually about right for selector interaction) would help immensely. Often it’s possible to spot the problem just by reading the comments.
The basic princip is that only the selector thread does selector related operations. New connections are added to a queue and the selector thread then registers them as part of the select loop:
//Executed for all new connections
//ConnectionTenant is a buffer holding object
sc.configureBlocking(false);
ConnectionTenant ct = new ConnectionTenant(this);
SelectionKey sk = sc.register(selector, SelectionKey.OP_READ, ct);
ct.setSelectionKey(sk);
Similar, if the connection tenant wants to write, it adds a request to a queue, as part of the select loop the thread then modifies interestOps with this:
//Executed for each connection that wants to write
sk.interestOps(sk.interestOps() | SelectionKey.OP_WRITE);
The SelectionKeys returned by selector.select(); is checked for valid operations by this code:
if (sk.isReadable()){
if (!processRead(sk)) return;
}
if (sk.isWritable()){
processWrite(sk);
}
If its readable the reading is done with this:
//Using a shared direct bytebuffer for reading in the available data. In this example the data is never passed on/used
readBuffer.clear();
ReadableByteChannel rbc = (ReadableByteChannel)sk.channel();
int numBytesRead = 0;
And if its writeable:
//ConnectionTenant holds a writebuffer for the connection. Get the buffer and write from it. ConnectionTenant takes care of preparing buffer to be written before returning it.
ConnectionTenant ct = (ConnectionTenant)sk.attachment();
if(ct.hasData()){
WritableByteChannel wbc = (WritableByteChannel)sk.channel();
ByteBuffer writeBuffer = ct.getOutputBuffer();
wbc.write(writeBuffer);
ct.onWritePerformed();
The initial handling of the Set returned by the select operation is handeled by this code:
//If the selector returned any keys iterate through the key set and toss each of them off to processing
if (nSelKeys == 0) continue;
Set keys = selector.selectedKeys();
Iterator i = keys.iterator();
while (i.hasNext()){
SelectionKey sk = (SelectionKey)i.next();
i.remove();
processKey(sk);
}
[quote]my selector suddenly starts to return immidiatly on select, returning 0 selectionkeys. Sometime this condition is even created the first time I connect from flash. My orriginal code connected to another server (and went bananas when the connections were dropped quick), but the behavior has been similar with this test code.
[/quote]
The only way it will return 0 selectionkeys is if you called wakeup() (you’re not using interrupt() AFAICS).
Try checking your code that calls wakeup() and see how it could end up being invoked more often than you expect…?
EDIT: Here’s a guess: you’re calling wakeup() once too many times for each time you intend to call it. If you check the API docs you’ll see that they “stack up” - so that if you call it twice when a select is blocking, the first one will wake up the selector, and the second one will be queued so that the next “select” returns as soon as it’s called.
Ive tried a System.out.println("…") before both wakeups, and its never called. The selector just returns immidiatly on .select(); until a new connections is added.
It keep returning on select (produce 4mb logfile in 10-15 seconds) so is not because wakeups is queued.
Question: if you put non-blocking channels in a selector, will they automatically appear ‘ready’ when you do a select? I thought the point of selectors was you register blocking channels and when one of them ‘unblocks’ it will notify the selector? Or something like that…
I’m referring to this block of code:
//Executed for all new connections
//ConnectionTenant is a buffer holding object
sc.configureBlocking(false);
ConnectionTenant ct = new ConnectionTenant(this);
SelectionKey sk = sc.register(selector, SelectionKey.OP_READ, ct);
ct.setSelectionKey(sk);
[quote]Ive tried a System.out.println("…") before both wakeups, and its never called. The selector just returns immidiatly on .select(); until a new connections is added.
[/quote]
Assuming there are no keys, and if you can comment out the wakeups, and there are no other wakeups or interrupts in another part of your code (do a search/replace for them), AND if you aren’t silently handling an exception, then you probably have a bug. C.f. the API docs for info on what select does, and assuming the above it looks to me (I just re-read it to be sure, but maybe I’ve missed something so check yourself) like the contract is being broken.
Cut out as much code as possible whilst preserving the problem, and get ready to file a bug report…but paste here if you can get it down real small (inline most methods, and aim for under 60 LOC if you can) and maybe some subtle mistake will become obvious :).
[quote]Question: if you put non-blocking channels in a selector, will they automatically appear ‘ready’ when you do a select? I thought the point of selectors was you register blocking channels and when one of them ‘unblocks’ it will notify the selector? Or something like that…
[/quote]
They automatically appear as “ready” for whichever of the operations you said you wanted to be notified of IFF that operation has data (or something; nb there is the undocumented notify-of-disconnect that counts as “data” here) ready.
But they don’t disappear unless you manually remove them.
But he says nothing is appearing as ready - there are no selection keys in the selector’s set.
I’ve tried inlining as much as possibel, and made a lil discovery. If I register the connection for OP_WRITE right from the beginning then I cant create the problemo, so seems it could be related to the sk.interestOps(sk.interestOps() | SelectionKey.OP_WRITE) call. Ill get some more tests done on this before posting more code, cant really get it down to 60loc so far.
Another little note is that it seems much easier to create this condition when run the flash client on another computer, so might be a time factor involved.
Ok, well, even if that is the case, I’m not sure if you want to set these sockets up as non-blocking because that’s what the selector is for: to block only when all channels in the selector would block. So change the line of code to this:
//ConnectionTenant is a buffer holding object
sc.configureBlocking(true);
and see if that gives you the desired result. If it fixes the problem but what blahablhbalhahalahaha says is correct, then that’s probably an issue that needs to be reported.
Okay I’ve figured out what causes the problem, dunno if you would classify it as a bug.
In my original test code I made a ConnectionTenant object for each connection, which would hold an internal write buffer. The tenant would ready some welcome/ackknowledge data in a bytebuffer, and on setSelectionKey(SelectionKey sk) it would call back to add itself to a “wantToWrite” queue. Effective result of this was that the selectors own thread called back into a method that called selector.wakeup.
In the current version of my code I dont even write (and dont have to read either, just got the code for it to handle disconnects):
//Register the new connections in queue
for (Object o = newConnections.getObjectNow(); o != null; o = newConnections.getObjectNow()){
SocketChannel sc = (SocketChannel)o;
if(!sc.isConnected()) return;
try{
System.out.println("[ConnectionWorker] Registering new connection");
sc.configureBlocking(false);
SelectionKey sk = sc.register(selector, SelectionKey.OP_READ);
System.out.println("INNER WAKEUP");
//selector.wakeup();
}catch (Throwable tr){
tr.printStackTrace();
closeChannel(sc);
}
}
The code is placed before selector.select() in the select loop. If I comment out selector.wakeup() everything runs as normal, but if its not the selector will start to return immidiatly on select. The wakeup placed in the code snippet will only be called for every new connection, so its not like it queue a new “wakeup request” up each loop. Ive made test where wakeup() in the select loop is called one time, but the selector keep non-blocking select until a new connection is made.
An imporant note is that if I connect to the server with telnet from my workstation (where I also run the server) I cant create the bug. The first time (most of the time, sometimes take 2-3 connections) I connect with telnet from my “test machine” (stands right next to workstation, both connected to same switch) it will start its non-blocking select. I dont know enough about low lvl io to say anything based on this, but it obviously has an effect that its a “real” network connection.
[quote]Ok, well, even if that is the case, I’m not sure if you want to set these sockets up as non-blocking because that’s what the selector is for: to block only when all channels in the selector would block. So change the line of code to this:
[/quote]
!!! what do you see as the point of the method call configueBlocking() then, if it’s not to enable non-blocking mode?
[quote]Okay I’ve figured out what causes the problem, dunno if you would classify it as a bug.
…Effective result of this was that the selectors own thread called back into a method that called selector.wakeup.
[/quote]
So my guess was pretty close?
If you look at the way the API is designed this problem makes sense somewhat… The design (although Sun doesn’t really explain this in the docs, you can find treatises on the different ways OS’s implement asynch elsewhere) is that once something happens that the Selector notices, it assumes that thing is always happening, and doesn’t listen out for it to stop.
Hence the warnings to remebmer to remove keys from the set, else your channels will always appear to be readable/writable/etc from as soon as they first become so.
If the select exited with a key state-change, you can reset it by removing the keys. If it exited with a wakeup with no keys, you can’t do anything to change it’s internal state, so it carries on in the same state forever returning an empty key set.
I suspect there is a naive FSM inside which saves it’s previous state and monitors if the state has been changed. If this is even close to true, it’s very sad because it means the authors didn’t have a good set of unit tests (this happens very occasionally with particular parts of the standard libs, where a group of bugs appear that show the author of a particular class was not doing much unit testing).
I believe this is definitely a bug, because I’m pretty sure that this wasn’t the intended behaviour. I suggest you log a bug-report, and put in the suggested action/workaround fields something like:
You could change it to automatically reset its status if it leaves a select with no keys, so that it won’t immediately do the same thing on the next select call (i.e. fix the bug) - this is the preferred option
You could add a method .reset() to Selector which does the same thing manually, so that if someone calls wakeup and gets back an empty set they can at least force it to go back to blocking - this is in case there is some reason why number 1 is undesirable. It also has the benefit of being backwards compatible.
If there’s a reason why this isn’t a bug, they’ll probably tell you!
(don’t forget to include your handy shortened code; with the code snippet they’re much more likely to accept the bug, assuming you give them enough info to reproduce it!)