Read File to String: can you do faster ?

Eli’s version will also append a ‘\n’ after the last line.

If he would read/write/read/write/read/write, the file would get larger and larger.

Yeah don’t worry Z-Man, it’s not at all obvious that concatenating strings will be so incredibly slow. But a good rule of thumb is that if you are ever worried about speed when string combining is involved, always use a StringBuffer. If speed doesn’t matter, don’t bother, because it’s uglier (as you already pointed out).

Yeah, in cases where this is an issue I just drop the last character every time. But I’ve never worried much about string file speed because I only ever do it once and I never have massive files. But I did once have a case where I was reading in a file, editing it, and then re-saving it and I had to deal with the trailing \n.

Obviously this method has some theoretical limitations to size, but you could set some kind of size limit where if size > limit, choose the buffered approach.

At least it’s fast.


public final static String readFile(String fileName) throws IOException {
	File f = new File(fileName);
	FileInputStream fstream = new FileInputStream(f);
	byte[] bytes = new byte[(int) f.length()];
	fstream.read(bytes);
	fstream.close();
	return new String(bytes);
}


I hadn’t thought this would be viable for small files, but it actually performs very well. And the list goes on and on :wink:

public final static String readFile(String file) throws IOException {
	FileChannel channel = new FileInputStream(new File(file)).getChannel();
	ByteBuffer buffer = ByteBuffer.allocate((int) channel.size());
	channel.read(buffer);
	channel.close();
	return new String(buffer.array());
}

I don’t think it gets much faster than this (when you’re reading large files everythings’ pretty close in speed)

using FileChannel ? nah, when I benchmarked that it was slow
maybe not the slowest but slower than your previous code and the java7 method.

interesting, using java 7 it comes up as slightly faster for larger files and noticeably faster for small files.

how are you doing your benchmark? make sure you’re running each method in its own instance of the JVM

long before = System.nanoTime();
    	
    	for (int i=0;i<100;i++)
    	{
    		Util.readFromFile(path);
    	}
    	System.out.println("1: "+((System.nanoTime()-before)/1000L)+"us");
    	
    	
    	before = System.nanoTime();
    	for (int i=0;i<100;i++)
    	{
    		Util.readFileWithChannel(path);
    	}
    	System.out.println("2: "+((System.nanoTime()-before)/1000L)+"us");
    	
    	
    	before = System.nanoTime();
    	for (int i=0;i<100;i++)
    	{
    		Util.readFromFileJava7(path);
    	}
    	System.out.println("3: "+((System.nanoTime()-before)/1000L)+"us");
1: 305061us
2: 1579026us
3: 1476928us

file is 500kb in this run.

You’re making the classic mistake in file performance benchmarking: you forget that the OS will cache any file that was recently read.

For a realistic benchmark, overwrite the file prior to each time you read the file.

In the above benchmark, that means overwriting the file 300 times, otherwise your results are useless.


    	int iterations = 20;
    	long before = System.nanoTime();
    	scrambleFile(path);

    	for (int i=0;i<iterations;i++)
    	{
    		Util.readFromFileJava7(path);
    		scrambleFile(path);
    	}
    	System.out.println("1: "+((System.nanoTime()-before)/1000L/1000L)+"ms");
    	
    	before = System.nanoTime();
    	for (int i=0;i<iterations;i++)
    	{
    		Util.readFileWithChannel(path);
    		scrambleFile(path);
    	}
    	System.out.println("2: "+((System.nanoTime()-before)/1000L/1000L)+"ms");
    	
    	before = System.nanoTime();
    	for (int i=0;i<iterations;i++)
    	{
    		Util.readFromFile(path);
    		scrambleFile(path);
    	}
    	System.out.println("3: "+((System.nanoTime()-before)/1000L/1000L)+"ms\n\n");
    	
    	long r = 0L;
    	for (int i=0;i<iterations;i++)
    	{
    		before = System.nanoTime();
    		scrambleFile(path);
    		r += ((System.nanoTime()-before)/1000L/1000L);
    	}
    	System.out.println("Scramble test average: "+(r/(float)iterations)+"ms");
    	scrambleFile(path);

1: 668ms
2: 614ms
3: 655ms


Scramble test average: 29.45ms

Seems to faster now, although the differences are getting very small here

This is what I wanted to point out: in the end the harddisk is limiting factor, the CPU is mostly idling, so the code is largely irrelevant, unless it’s extremely inefficient.

the difference is more apparent with smaller files (and more iterations)

Yeah, I was just gonna say that. File ops aren’t really that important to optimize. If they work, they are almost always faster than the hard drive unless you completely f*ck up. However it would be interesting if anyone with a fast SSD would do some benchmarks. :stuck_out_tongue:

I think you’re wrong here. By eliminating the hard drive from the benchmark, aren’t you getting more accurate results? We’re benchmarking CPU performance, right? Do the different read methods actually affect the way a hard drive reads files to the point that performance differs noticeably?

If you want to benchmark IO APIs, use a ramdisk :point: