Read File to String: can you do faster ?

You don’t need permission for code this short and simple =P

Well I never do that. I structure everything like a C++ Game: Many Folders which hold all the content and an exe file.
I wold care about compression and or formats, I could just compress/decompress it myself, but who cares. Only if gets > 700MB =P

Oh yea using String concatenation while using StringBuilder.append is kinda silly. Call append each time.

Well I guess most people don’t write their own Map Editor which they rarely have to write stuff to file.

The only problem I can see with your method Cero, is that you are assuming the platform’s default charset is the same as the charset used when writing your data file. (%29]new String(byte[]))

Explicitly specify “UTF-8” both when reading & writing your Strings, and you shouldn’t have any compatibility problems.

oh yeah thanks, just new String(bytes, “UTF8”)

but when writing I do it like this:

void writeToFile(String msg, String path)
    {
    	try{
    		BufferedWriter bw = new BufferedWriter(new FileWriter(path));
    		bw.write(msg);
    		bw.close();
    	}catch(Exception e) { e.printStackTrace(); }
    }

Not sure how to do it then.
Well its not like the user has to open files my code writes… savegames, maps… all read by my code
maybe the console log

Well, how big is your average file?

public static final String readFile(String file) throws IOException {
	BufferedInputStream in = new BufferedInputStream(new FileInputStream(file));
	ByteBuffer buffer = new ByteBuffer();
	byte[] buf = new byte[1024];
	int len;
	while ((len = in.read(buf)) != -1) {
		buffer.put(buf, len);
	}
	in.close();
	return new String(buffer.buffer, 0, buffer.write);
}

class ByteBuffer {

	public byte[] buffer = new byte[256];

	public int write;

	public void put(byte[] buf, int len) {
		ensure(len);
		System.arraycopy(buf, 0, buffer, write, len);
		write += len;
	}

	private void ensure(int amt) {
		int req = write + amt;
		if (buffer.length <= req) {
			byte[] temp = new byte[req * 2];
			System.arraycopy(buffer, 0, temp, 0, write);
			buffer = temp;
		}
	}

}

when reading a 249 byte file it was faster, when reading a 10,240,000 byte file it was faster, when reading a 10,240 byte file it was faster

just gonna assume it’s always faster

side-note: NIO is not any faster than IO

Method I learned from Matzon:


private String readFile(String fileName) throws IOException {
    BufferedInputStream fin = new BufferedInputStream(new FileInputStream(fileName));
    ByteArrayOutputStream bout = new ByteArrayOutputStream();
    byte buffer[] = new byte[8192];
    int read = fin.read(buffer);
    while(read != -1) {
        bout.write(buffer, 0, read);
        read = fin.read(buffer);
    }
    fin.close();
    return new String(bout.toByteArray());
}

Tested it on file sizes: 387, 3KB and 8.3MB

The new way is faster by 2/3 on the biggest and smallest. Surprisingly my way is slightly faster(109511ns compared to 170413ns) on the middle size.

Mines is a little faster :stuck_out_tongue: (maybe because it doesn’t have the abstractions of ByteArrayOutputStream?) but they look very similar, I didn’t know other people used this method, I just thought it up when reading this thread.

this is fast. average 5 times faster, great stuff

depending on the size of files, if you are reading really large files you might want to up the buffer from 1024 (to a max of 8192, after that there’s only minimal change as far as I can tell)

Is the ‘\n’ also included on the resulted String?

It reads the whole file as opposed to line by line, so yes

counterp’s and CaptainJester’s/Matzon’s techniques will keep “Windows” newlines as “\r\n” in the resulting String, but Eli’s and Z-Man’s won’t, so there is a subtle difference. Not sure about Cero’s since I haven’t made the move to Java 7.

Eli’s version will also append a ‘\n’ after the last line.

If he would read/write/read/write/read/write, the file would get larger and larger.

Yeah don’t worry Z-Man, it’s not at all obvious that concatenating strings will be so incredibly slow. But a good rule of thumb is that if you are ever worried about speed when string combining is involved, always use a StringBuffer. If speed doesn’t matter, don’t bother, because it’s uglier (as you already pointed out).

Yeah, in cases where this is an issue I just drop the last character every time. But I’ve never worried much about string file speed because I only ever do it once and I never have massive files. But I did once have a case where I was reading in a file, editing it, and then re-saving it and I had to deal with the trailing \n.

Obviously this method has some theoretical limitations to size, but you could set some kind of size limit where if size > limit, choose the buffered approach.

At least it’s fast.


public final static String readFile(String fileName) throws IOException {
	File f = new File(fileName);
	FileInputStream fstream = new FileInputStream(f);
	byte[] bytes = new byte[(int) f.length()];
	fstream.read(bytes);
	fstream.close();
	return new String(bytes);
}


I hadn’t thought this would be viable for small files, but it actually performs very well. And the list goes on and on :wink:

public final static String readFile(String file) throws IOException {
	FileChannel channel = new FileInputStream(new File(file)).getChannel();
	ByteBuffer buffer = ByteBuffer.allocate((int) channel.size());
	channel.read(buffer);
	channel.close();
	return new String(buffer.array());
}

I don’t think it gets much faster than this (when you’re reading large files everythings’ pretty close in speed)

using FileChannel ? nah, when I benchmarked that it was slow
maybe not the slowest but slower than your previous code and the java7 method.

interesting, using java 7 it comes up as slightly faster for larger files and noticeably faster for small files.

how are you doing your benchmark? make sure you’re running each method in its own instance of the JVM

long before = System.nanoTime();
    	
    	for (int i=0;i<100;i++)
    	{
    		Util.readFromFile(path);
    	}
    	System.out.println("1: "+((System.nanoTime()-before)/1000L)+"us");
    	
    	
    	before = System.nanoTime();
    	for (int i=0;i<100;i++)
    	{
    		Util.readFileWithChannel(path);
    	}
    	System.out.println("2: "+((System.nanoTime()-before)/1000L)+"us");
    	
    	
    	before = System.nanoTime();
    	for (int i=0;i<100;i++)
    	{
    		Util.readFromFileJava7(path);
    	}
    	System.out.println("3: "+((System.nanoTime()-before)/1000L)+"us");
1: 305061us
2: 1579026us
3: 1476928us

file is 500kb in this run.

You’re making the classic mistake in file performance benchmarking: you forget that the OS will cache any file that was recently read.

For a realistic benchmark, overwrite the file prior to each time you read the file.

In the above benchmark, that means overwriting the file 300 times, otherwise your results are useless.


    	int iterations = 20;
    	long before = System.nanoTime();
    	scrambleFile(path);

    	for (int i=0;i<iterations;i++)
    	{
    		Util.readFromFileJava7(path);
    		scrambleFile(path);
    	}
    	System.out.println("1: "+((System.nanoTime()-before)/1000L/1000L)+"ms");
    	
    	before = System.nanoTime();
    	for (int i=0;i<iterations;i++)
    	{
    		Util.readFileWithChannel(path);
    		scrambleFile(path);
    	}
    	System.out.println("2: "+((System.nanoTime()-before)/1000L/1000L)+"ms");
    	
    	before = System.nanoTime();
    	for (int i=0;i<iterations;i++)
    	{
    		Util.readFromFile(path);
    		scrambleFile(path);
    	}
    	System.out.println("3: "+((System.nanoTime()-before)/1000L/1000L)+"ms\n\n");
    	
    	long r = 0L;
    	for (int i=0;i<iterations;i++)
    	{
    		before = System.nanoTime();
    		scrambleFile(path);
    		r += ((System.nanoTime()-before)/1000L/1000L);
    	}
    	System.out.println("Scramble test average: "+(r/(float)iterations)+"ms");
    	scrambleFile(path);

1: 668ms
2: 614ms
3: 655ms


Scramble test average: 29.45ms

Seems to faster now, although the differences are getting very small here