Read File to String: can you do faster ?

Riven · August 16, 2011, 11:37am

Eli’s version will also append a ‘\n’ after the last line.

If he would read/write/read/write/read/write, the file would get larger and larger.

Demonpants · August 16, 2011, 5:33pm

Yeah don’t worry Z-Man, it’s not at all obvious that concatenating strings will be so incredibly slow. But a good rule of thumb is that if you are ever worried about speed when string combining is involved, always use a StringBuffer. If speed doesn’t matter, don’t bother, because it’s uglier (as you already pointed out).

Yeah, in cases where this is an issue I just drop the last character every time. But I’ve never worried much about string file speed because I only ever do it once and I never have massive files. But I did once have a case where I was reading in a file, editing it, and then re-saving it and I had to deal with the trailing \n.

Addictman · August 16, 2011, 6:33pm

Obviously this method has some theoretical limitations to size, but you could set some kind of size limit where if size > limit, choose the buffered approach.

At least it’s fast.


public final static String readFile(String fileName) throws IOException {
	File f = new File(fileName);
	FileInputStream fstream = new FileInputStream(f);
	byte[] bytes = new byte[(int) f.length()];
	fstream.read(bytes);
	fstream.close();
	return new String(bytes);
}

counterp · August 17, 2011, 2:50am

I hadn’t thought this would be viable for small files, but it actually performs very well. And the list goes on and on

public final static String readFile(String file) throws IOException {
	FileChannel channel = new FileInputStream(new File(file)).getChannel();
	ByteBuffer buffer = ByteBuffer.allocate((int) channel.size());
	channel.read(buffer);
	channel.close();
	return new String(buffer.array());
}

I don’t think it gets much faster than this (when you’re reading large files everythings’ pretty close in speed)

Cero · August 17, 2011, 12:47pm

counterp:

I hadn’t thought this would be viable for small files, but it actually performs very well. And the list goes on and on
public final static String readFile(String file) throws IOException {
	FileChannel channel = new FileInputStream(new File(file)).getChannel();
	ByteBuffer buffer = ByteBuffer.allocate((int) channel.size());
	channel.read(buffer);
	channel.close();
	return new String(buffer.array());
}
I don’t think it gets much faster than this (when you’re reading large files everythings’ pretty close in speed)

using FileChannel ? nah, when I benchmarked that it was slow
maybe not the slowest but slower than your previous code and the java7 method.

counterp · August 17, 2011, 1:01pm

interesting, using java 7 it comes up as slightly faster for larger files and noticeably faster for small files.

how are you doing your benchmark? make sure you’re running each method in its own instance of the JVM

Cero · August 17, 2011, 3:31pm

long before = System.nanoTime();
    	
    	for (int i=0;i<100;i++)
    	{
    		Util.readFromFile(path);
    	}
    	System.out.println("1: "+((System.nanoTime()-before)/1000L)+"us");
    	
    	
    	before = System.nanoTime();
    	for (int i=0;i<100;i++)
    	{
    		Util.readFileWithChannel(path);
    	}
    	System.out.println("2: "+((System.nanoTime()-before)/1000L)+"us");
    	
    	
    	before = System.nanoTime();
    	for (int i=0;i<100;i++)
    	{
    		Util.readFromFileJava7(path);
    	}
    	System.out.println("3: "+((System.nanoTime()-before)/1000L)+"us");

1: 305061us
2: 1579026us
3: 1476928us

file is 500kb in this run.

Riven · August 17, 2011, 3:36pm

You’re making the classic mistake in file performance benchmarking: you forget that the OS will cache any file that was recently read.

For a realistic benchmark, overwrite the file prior to each time you read the file.

In the above benchmark, that means overwriting the file 300 times, otherwise your results are useless.

Cero · August 17, 2011, 3:50pm


    	int iterations = 20;
    	long before = System.nanoTime();
    	scrambleFile(path);

    	for (int i=0;i<iterations;i++)
    	{
    		Util.readFromFileJava7(path);
    		scrambleFile(path);
    	}
    	System.out.println("1: "+((System.nanoTime()-before)/1000L/1000L)+"ms");
    	
    	before = System.nanoTime();
    	for (int i=0;i<iterations;i++)
    	{
    		Util.readFileWithChannel(path);
    		scrambleFile(path);
    	}
    	System.out.println("2: "+((System.nanoTime()-before)/1000L/1000L)+"ms");
    	
    	before = System.nanoTime();
    	for (int i=0;i<iterations;i++)
    	{
    		Util.readFromFile(path);
    		scrambleFile(path);
    	}
    	System.out.println("3: "+((System.nanoTime()-before)/1000L/1000L)+"ms\n\n");
    	
    	long r = 0L;
    	for (int i=0;i<iterations;i++)
    	{
    		before = System.nanoTime();
    		scrambleFile(path);
    		r += ((System.nanoTime()-before)/1000L/1000L);
    	}
    	System.out.println("Scramble test average: "+(r/(float)iterations)+"ms");
    	scrambleFile(path);

1: 668ms
2: 614ms
3: 655ms


Scramble test average: 29.45ms

Seems to faster now, although the differences are getting very small here

Riven · August 17, 2011, 3:56pm

This is what I wanted to point out: in the end the harddisk is limiting factor, the CPU is mostly idling, so the code is largely irrelevant, unless it’s extremely inefficient.

counterp · August 17, 2011, 4:46pm

the difference is more apparent with smaller files (and more iterations)

theagentd · August 19, 2011, 3:53am

Yeah, I was just gonna say that. File ops aren’t really that important to optimize. If they work, they are almost always faster than the hard drive unless you completely f*ck up. However it would be interesting if anyone with a fast SSD would do some benchmarks.

I think you’re wrong here. By eliminating the hard drive from the benchmark, aren’t you getting more accurate results? We’re benchmarking CPU performance, right? Do the different read methods actually affect the way a hard drive reads files to the point that performance differs noticeably?

Riven · August 19, 2011, 7:53am

If you want to benchmark IO APIs, use a ramdisk :point: