What the heck are you on about? What does multithreading file writing have to do with the hdd seeking? “It doesn’t matter how you ‘reason’ your way out”, the benchmark shows multithreaded is faster.
I think you’re thinking too much into it If the benchmark says that dedicating more than one thread to file I/O operations executes an operation faster than on a single thread, it’s fair to assume that you should probably dedicated more than one
I guess it’s a good thing I’m right then, I don’t wanna be reamed :persecutioncomplex:
I think he’s under the misconception that you can concurrently read things from a hdd (that would potentially be obscenely inefficient, because of the reasons he stated) in which case his theory would be correct.
Do you know what the difference is between requesting to write files from two threads is as opposed to requesting to write a file from one thread?
(the actual process that is happening after you send a request to write a file)
actually there is no difference, which is why you guys are wrong.
if anything the difference is, with multi threading you add 2 requests to the queue one after the other (potentially) and the first request is fulfilled then the second request is filled.
in single threading, you add one request to the queue, it is fulfilled, then you add the next request to the queue and it is fulfilled.
if externals application are hogging the hdd, then there should really be no speed difference (if they are, multithreaded will be faster, depending on what other applications are doing)
so in an ideal environment where only your application is using the hdd, in terms of file writing speed, there is no ~noticeable difference
the actual speed comes from other things not directly related to writing files (for example, things like creating FileOutputStream objects can be done concurrently)
this is the truth, this was supported by my benchmarks
i don’t see why you keep holding on to the idea that when you multithread file writing, the disk is writing each file at the same time and therefore seeking a lot… that’s not true. you only seek once per file
If you all think the operating system translates your InputStream reads directly into disk seeks, you need to brush up on the evolution of operating systems from the DOS days. Modern operating systems have buffer caches and IO schedulers that go all the way down to understanding disk geometry.
The main problem with using threads willy nilly is the amount of memory they take up, since they each have to haul a stack along with them. All the other claims being thrown around are meaningless unless they can be backed up with actual data. Benchmarks aren’t perfect, but I’ll take them over anecdote any day.
Besides, if you want performance, you really should be using nio, and that’s an async design that doesn’t lend itself to thread-per-stream anyway.
To ensure ‘worst case secenario’, the ‘concurrent’ file I/O is done from a single thread. This is all the same for the harddisk, it just makes sure every write is done to a different file.
public static void main(String[] args) throws IOException
{
int writeSize = 4 * 1024;
int totalSize = 512 * 1024 * 1024;
for (int fileCount = 1; fileCount <= 8; fileCount++)
{
FileOutputStream[] fos = new FileOutputStream[fileCount];
for (int i = 0; i < fos.length; i++)
{
fos[i] = new FileOutputStream(new File("C:/test." + i + ".tmp"));
}
long t0 = System.currentTimeMillis();
writeInterleavedStreams(fos, totalSize / writeSize, writeSize);
long t1 = System.currentTimeMillis();
int size = (totalSize / 1024 / 1024);
int sec = (int) ((t1 - t0) / 1000);
System.out.println("wrote " + fos.length + " files of " + size + "MB in " + sec + " sec, total throughput: " + (size / sec * fileCount) + "MB/sec");
}
System.out.println();
}
public static void writeInterleavedStreams(OutputStream[] streams, int writeCount, int writeSize) throws IOException
{
byte[] buf = new byte[writeSize];
for (int i = 0; i < writeCount; i++)
{
for (OutputStream stream : streams)
{
stream.write(buf);
stream.flush();
}
}
for (OutputStream stream : streams)
{
stream.close();
}
}
wrote 1 files of 512MB in 3 sec, total throughput: 170MB/sec <--- bad-ass
wrote 2 files of 512MB in 32 sec, total throughput: 32MB/sec <--- once you concurrently write more than 1 file, it's all lost
wrote 3 files of 512MB in 45 sec, total throughput: 33MB/sec
wrote 4 files of 512MB in 78 sec, total throughput: 24MB/sec
wrote 5 files of 512MB in 116 sec, total throughput: 20MB/sec
wrote 6 files of 512MB in 166 sec, total throughput: 18MB/sec
wrote 7 files of 512MB in 182 sec, total throughput: 14MB/sec
wrote 8 files of 512MB in 224 sec, total throughput: 16MB/sec
Riven, try concurrently writing on the same file. It should be faster than writing serially on 1 file. It triggers NCQ. Ok, please test with NCQ enabled, i.e. AHCI. Also, Java’s IO is written to work concurrently to some extent, so the said context switching won’t occur if it’s the same file.
Sorry, I feel the need to sit on the fence yet again - a PITA for you guys and for me! ;D
Isn’t what Riven and counterp are saying both right?
Riven is right about the inherently serial nature of the disk write, but as others have pointed out, in most modern OS’s / file systems isn’t the FileOutputStream actually writing to RAM to be later flushed to disk by the filesystem, and therefore the implication that each thread context shift results in a disk seek is not correct???
Oh, and the effect of the above will also be that benchmarks will be quite different across different file systems.