Asynchronous network library

stevetaylor · February 3, 2006, 12:30pm

Hi.

I’ve just finished coding and testing an asynchronous (non-blocking) network library, built around java.nio. This was never my intention; I switched from the thread-per-client architecture to java.nio because a number of people recommended it. Apparently it’s far more scalable for servers (which is good for MMORPGs). One thing led to another and the next thing I knew I was coding a library.

Well, it’s a library in the sense that it performs a set of tasks in a particular problem domain. But it’s really not big at all - at least on the outside. Users of this library see only two classes and two interfaces.

More information here.

Riven · February 3, 2006, 12:37pm

Hm… hm hm hm…

You built a wrapper around NIO and one can only read and write byte-arrays? What’s the point then, only async IO? Too bad.

Further, methods like “HostEvent.close()” have to be renamed. You aren’t closing the Event.

stevetaylor · February 3, 2006, 2:38pm

[quote]You built a wrapper around NIO and one can only read and write byte-arrays? What’s the point then, only async IO? Too bad.
[/quote]
Yes, async IO is exactly the one and only point. Converting from bytes to ints, shorts, floats, chars, strings, etc. was outside the problem domain at the time. It’s a problem that can be solved either by existing jdk classes, 3rd party packages or a roll-your-own solution. As long as you’ve got the raw bytes, there’s enough to enccode/decode anything. However, I will look into other data types if I see a good efficiency related case for adding them.

In building this framework, the purpose was to use it in a MMORPG game server. I will implement a byte-based command protocol. Byte-sized granularity seems to be the way it’s done as this is the most band-width efficient.

[quote]Further, methods like “HostEvent.close()” have to be renamed. You aren’t closing the Event.
[/quote]
Fair point. Perhaps something like closeBuffer() or closeDataStream() would be nicer. Do you have any better suggestions? The better the names, the more user-friendly it will be.

Thanks for your comments.

Riven · February 3, 2006, 2:46pm

I didn’t mean to suggest you needed to add support for short/int/long/float/double…

Keep in mind that transfering byte-arrays is done through the CPU cache, which is needlessly inefficient because it causes a lot of cache-misses. So my main point was that the library should allow for direct (Byte)Buffer-level access. Especially for MMORPGs.

stevetaylor · February 3, 2006, 4:38pm

Yeah, i thought I’d made things more efficient by allowing a transfer into a specific location in an array, so that extra copies wouldn’t be needed. The problem is that there is only one ByteBuffer for all hosts to read into. This solves the problem of either allocating/deallocating ByteBuffers on connection/disconnection or pooling them and risking running out. It also solves the problem of the selector thrashing when there isn’t enough space in the buffer to read incoming data. It also makes SocketManger less likely to crash when a ByteBuffer is in the wrong state.

Do you think it is best to have one ByteBuffer per connection, drawn from a finite-size pool? Then I guess the program using SocketManager could do what it wants. The only problem I see is that the SocketManager and main program have to play nice with each other. That means that the ByteBuffer should be processed only in the hostDataAvailable callback.

What do you think?

To be honest, I can’t see myself processing commands directly out of a ByteBuffer. But stranger things have happened I guess.

As for the output, I originally tried queueing persistent ByteBuffers, but I had all sorts of concurrency issues, so I went with array copying.

Riven · February 3, 2006, 5:02pm

I think you should keep it realistic

Anybody writing an MMORPG will roll their own lib, to get the best performance. The more you’ll change your lib for best performance, the more you’ll find yourself removing code, until you’re almost back at the pure NIO classes.

Your lib can be handy for small projects, where I/O isn’t that dramatic. You don’t have to aim much higher than that.

stevetaylor · February 4, 2006, 11:09am

I was going to rework the library yet again, to have a ByteBuffer per connection. Then I realised this is a bad idea and that what I have now may actually perform better. From reading various performance tuning stuff (here and elsewhere), it seems that the the more I call get and put methods on a direct buffer, the more performance I lose.

Consider the case of parsing direct ByteBuffers - this involves many calls to the get methods to extract data as it’s needed. On the other hand, consider what I’m doing now - calling a bulk transfer method once only for each message sent from a channel. This most likely involves some sort of DMA transfer that completely bypasses the cache. If I understand correctly, the destination array will only be reloaded into the cache on an as-needed basis.

The more I look into it, the more it becomes clear that direct buffers are meant only as an interface between the JVM and native I/O. That implies their scope should be constrained to bulk data transfers.

A lot of this is speculation and vague estimation, derived from bits and pieces I’ve read here and there. The best thing I can do now is to resist spending more time optimizing the framework based on speculation and vague estimation. They say premature optimization is the root of all evil. It can be argued that commiting two weeks to convert to nio instead of making my game with thread-per-client was a premature optimization. However, this optimization is certainly backed in fact and experience - definitely worth the two weeks. If it turns out my nio framework is causing excess cache loading, then I’ll look at drastically changing it. Until then, it’s bugfixes only.

Riven · February 4, 2006, 11:35am

[quote=“stevetaylor,post:7,topic:26088”]
That’s not true at all. If you know what you are doing, you’ll get superiour performance from using NIO.

_{* when running the server VM, which is obvious, when running any large networked game…}

On a side-note, I’ve seen quite a few bugs and weird design in your sourcecode. The SocketManager constructor ignores all the parameters, and uses the DEFAULT_x_y_z values. Further you’re detaching Listeners on disconnect, only because you don’t want the hostDisconnected to be called more than once. That tells me there is a lot wrong with the underlying handling of NIO, as it’s relatively easy to handle disconnections properly, without going through the “disconnection-handling” code more than once.

It works like this:
Once there is a disconnection, you either get an Exception on writing to the channel, OR:
the channel suddenly becomes Readable, and returns -1 read bytes.

stevetaylor · February 4, 2006, 2:00pm

[quote]On a side-note, I’ve seen quite a few bugs and weird design in your sourcecode. The SocketManager constructor ignores all the parameters, and uses the DEFAULT_x_y_z values. Further you’re detaching Listeners on disconnect, only because you don’t want the hostDisconnected to be called more than once. That tells me there is a lot wrong with the underlying handling of NIO, as it’s relatively easy to handle disconnections properly, without going through the “disconnection-handling” code more than once.
[/quote]
I found the offending SocketManager constructor and fixed it. What a silly bug. I’ve now switched on a warning in eclipse (unread parameters) that would have caught this. Thanks for spotting it. The change has been uploaded.

As for detatching the listener, this is done on a just-in-case basis in keeping with the spirit of the closeHostConnection() method. The reason I suspect duplicate calling could occur is simply because the terminate() method closes all hosts in one thread while the selector is running in another. This could cause IOExceptions to be thrown in the selector loop which would also cause closeHostConnection to be called. Perhaps I could move the terminate code to the selector thread instead, then this detatching wouldn’t be necessary. But that would just be a waste of time for the sake of slightly beautifying the code - the net outcome would be the same.

Actually, I’m wrong… but I’ve deliberately left the above paragraph to show what I was thinking. I’ve just realised that calling closeHostConnection() from terminate() is a bad idea as it’s crossing a boundary that shouldn’t be crossed. Everything in InputThreadRunner should occure only in that thread. closeHostConnection() should be private (as it used to be). Terminate should just set the terminateThreads flag and then this should be handled in the selector loop. terminate() should block, waiting until all hosts are disconnected and the I/O threads are shutdown.

I’ll get onto this. Thanks for pointing it out.

stevetaylor · February 4, 2006, 4:14pm

Termination system fixed

I also added IP address lookup.

Thanks for helping again.

stevetaylor · February 5, 2006, 5:33pm

Riven, I was thinking a lot about the stuff you said about poor cache hits, etc. Also what you said about how the server jvm eliminates the array-buffer bottleneck got me thinking. For a while, the only way I could see direct ByteBuffer access working is if every connection had its own ByteBuffer. Of course, in a highly concurrent situation, that would cause even worse cache performance than my array-copy system.

Here’s an example of why I thought of having one ByteBuffer per client… Suppose a client sends a few integers and the server is expecting integers. What if a packet’s payload isn’t a multiple of 4? That means there will be 3 bytes remaining that the server program can’t really use. So you’d think that to keep track of this, on the server each connection would need its own buffer.

I was thinking for ages about it, trying to discover a solution that involved storing the left-over data somewhere, then retrieving it and appending the new ByteBuffer data onto that. Then I realised that instead of complicating SocketManager even more, I could pass this responsibility off to the implementor of HostListener. I did this by removing hostDataAvailable() and adding hostDataFound() and hostDataBuffered(). hostDataFound() is called after an OP_READ event is triggered and inputBuffer (a ByteBuffer) is cleared. It can add any left over data from the previous invocation of hostDataBuffered() (such as those pesky 3 bytes when ints are wanted). Once this method returns, the selector loop will read into inputBuffer from the channel then call the listener’s hostDataBuffered() method if zero or more bytes are read (otherwise disconnect the host).

I think a very simple way of parsing in hostDataBuffered would just be to put everything inside a big try block, catching a BufferUnderflowException. The catch clause would then get the remaining bytes out of the buffer and shove them back into the buffer on the next hostDataFound().

Hopefully this is the optimal solution.

I haven’t yet modified the output subsystem to take ByteBuffers instead of arrays. I don’t yet know how feasible this would be. When I come to a decision and make the necessary modifications (if any), I’ll post the updated package and documentation.

Riven · February 5, 2006, 9:29pm

I don’t see why you would have to split that method in two…

dataAvailable basicly means “there are bytes for you”, not that they were just received from the channel. So it can be remaining bytes from a previous time when data was read from the channel, but not (completely) read yet.

ByteBuffer bb = …;

while(true)
{
OP_READ:
channel.read(bb); // write from current pos, can be anything
bb.flip();
fireDataAvailableEvent(bb); // bb contains old and new bytes
int rem = bb.remaining(); // check how many bytes were not ‘consumed’
bb.clear();
// tricky part: write remaining bytes to beginning of ‘bb’
// if rem > bb.position, then you need a 2nd bb to copy, or not copy at all…
// now we’re ready to read bytes from the channel again, by just appending them
}

stevetaylor · February 6, 2006, 7:59am

Yeah, the problem with that is that if there are no more bytes available from the channel, then we have have these leftover bytes to store somewhere while we read from other channels. I think that’s what you mean by // if rem > bb.position, then you need a 2nd bb to copy, or not copy at all…, but I’m not sure… I don’t want to track this internally with extra ByteBuffers because that would be another management thing I’d have to worry about. The event handler can take care of that. Also if I have extra ByteBuffers for leftovers, then it starts to get back to the ByteBuffer-per-channel problem. Not a good idea, considering unit size of ByteBuffers (4KB on windows for example). If leftovers are handled by the handlers, then it’s likely that only just enough memory will be used to store them. Typically I will do that by having a scratch area (either a non-direct ByteBuffer or a byte array) in each client object’s parser instance.

Also, I don’t want this thing to keep firing events until the buffer is empty. The parser in the event handler may not necessarily be interested in individual bytes - it may be looking for ints, floats, short, chars, Strings, or any combination of these depending on the stage it’s at.

CommanderKeith · March 24, 2006, 5:29am

Hi Steve,
I like your networking libray, besides the below issue, its the easiest to use of all. You should post it on the networking forum.

My problem was that when I send 20,000 bytes from one SocketManager to another, the recieving one fires a hostDataAvailable event even though only about 13,000 bytes have been recieved into its byte array. If the event was fired only when the whole lot was ready, it would be the most simple libray to use of all.

To get my HostListener to batch up the separated byte arrays, I was thinking of doing this:

write the number of bytes being sent on the server-side
and then on the client read that byte array length number and batch up the byte arrays until they sum to give the right length.

Is that the work around you are using?

Thanks,
Keith