[SOLVED] Bytes And Java Strings

Hi.

I’ve recently been working a lot with java.nio, but mainly Channels, SelectionKey’s and ByteBuffers.

I’m having difficulties with reading data through channels. (Everything sent and received through Channels are in plain Bytes). I can send and receive just fine, and my echo server works without problems. But like I said, I can’t interpret the data that is being sent in a useful way.

What I’ve been doing is having test servers running on localhost and connecting to these through telnet. The Data I receive form telnet comes up as either gibberish or what seems to be nothing at all. for Debugging I’ve been trying to echo to the console with System.out.println() the data I’ve received but nothing sensible comes up in the slight. I’ve tried appending strings, reading asCharBuffer etc, the only thing remotely close to an alphabetical letter I’ve come to is the ‘?’ letter… and those only appear after CR.

Plain text is written in telnet and nothing shows up on console no matter what - why is this??

Hard to say anything without code samples.
Especially because NIO is involved. ;D

I think the problem is in wrong encoding/decoding of strings.
So, this possibly can help you: http://stackoverflow.com/questions/1252468/java-converting-string-to-and-from-bytebuffer-and-associated-problems.

I see, thanks for highlighting character encoding. The SO page you linked sent me to an interesting article by Joel Spolsky (co.f. of SO ) called “The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)”

I was in the middle of reading Java NIO by Ron Hitchens and stopped just after covering channels, keys and buffers to try out the stuff I had read.

This was all fine except for the character encoding issues I had. I see now that Character Sets are covered in the book in the following chapter. :L Which I’m now going to read in mild embarrassment. I’ve been quite ignorant towards the importance of how bytes are interpreted and the decades long struggle and ingenuity that’s been going into figuring out the whole mess.

The Article really opened my eyes quite a bit so thanks again:P

If you’re using the characters 0-9,a-z,A-Z, it should (almost) always work. If you still have odd characters, you simply have a bug in your networking code (byte transfer). No encoding will save you from that :slight_smile:

btw. A CharBuffer always uses 16 bit unicode encoding.

I fixed it by simply decoding the received data with utf-8.

protected static Charset cs = Charset.forName("utf-8");

// ...

CharBuffer buf = cs.decode(buffer);
		message += buf;

I checked my systems default charset with Charset.defaultCharset().name() and it spit out “windows-1252”

I’m thinking that might’ve been the issue.

I recommend you just stick with UTF-8:


//write the String
byte[] b = myString.getBytes("UTF-8");
byteBuffer.putInt(b.length);
byteBuffer.put(b);


//read the String
int len = byteBuffer.getInt();

//make sure len is not a ridiculous number that could crash your application ;)

byte[] b = new byte[len];
byteBuffer.get(b);
String s = new String(b,"UTF-8");

EDIT: heh, didn’t see that last message. It’s best to avoid Charset since it’s slow. See code above.

I think the issue in its core was that since channels are inconsistent in how many bytes are actually read, that the decoder wraps around the bytes it is fed and keeps them in track for you.

I’ll post my simple echo server in java.nio here for those interested to look it over.

http://www.java-gaming.org/?action=pastebin&id=62

What if the non-blocking channel.read() didn’t register all the bytes that you’re attempting to read with len?

And in your reading code, where exactly do you read the data into the buffer? o.O
Am I blind or do I only see you making a string of an empty allocated byte array of length len?

Oh whoops, I missed that line. I added it in the original post :stuck_out_tongue:

And that’s where you make sure you read all the bytes first. In my networking “library”, I send an integer containing the length of the ByteBuffer that was sent. In my read method, I make sure I read at least that many bytes before returning the data.

I see, that’s quite nifty.

Does it go against concurrency if you read the bytes stuck inside a loop until you reach the len amount, or it is better to wait for the next round-about and keep your key selected? (Ie. keeping it in the Selectors.selected() set).

But what about protocols that don’t adhere to the first int is the length of bytes sent? Or is this a non issue in general since you’ll always know what you’re going to get?

And why is the Charset class slow compared to the Strings getByte( str utf ) function?

It’s indeed a non-issue, because the used protocol must be known ahead of time.

It’s not. Both have the same implementation.