Problem reading chars 128-159

Hoping this is will be a simple problem :wink:

Using BufferedReader.read() to read a file char by char, I came across a problem when reading the characters 128-159

Here’s an image of what I mean:

I found out that 128-159 have something to do with the Win32-system-thing, is there a way to fix this?

Readers use charsets to convert the byte(s) into a char. When a char requires more than 8 bits, you have to use the highest bit to flag that a 2nd (or a 3rd) byte should be read. So you can only have 7 bits of data in each byte. Above 127, the 8th bit if 1, which makes the char-decoder think a second byte should be read, which is does, resulting in the larger-than-expected value. If you just want to read the bytes, don’t use a reader, but an InputStream.

It doesn’t look as simple as that because he has the correct number of outputs. But it does look like a charset issue; Windows-1252 and ISO-8859-1 are different in that range. I suggest you specify a charset when creating the InputStreamReader that you pass to the BufferedReader.

Have you tried BufferedReader.read(char[] cbuf, int off, int len) then examining the cbuf array?

And that should do?

effectively (not implementated that way) it just calls read() len-times.