Review my thread pattern?

What the heck are you on about? What does multithreading file writing have to do with the hdd seeking? “It doesn’t matter how you ‘reason’ your way out”, the benchmark shows multithreaded is faster.

I think you’re thinking too much into it :slight_smile: If the benchmark says that dedicating more than one thread to file I/O operations executes an operation faster than on a single thread, it’s fair to assume that you should probably dedicated more than one :wink:

Uh oh. You’re going to be reamed. Sorry. Don’t question Riven on technical knowledge. I’ve learned that lesson quite a few times. :’(

I guess it’s a good thing I’m right then, I don’t wanna be reamed :persecutioncomplex:

I think he’s under the misconception that you can concurrently read things from a hdd (that would potentially be obscenely inefficient, because of the reasons he stated) in which case his theory would be correct.

I could see multi-threaded seeking being about the same speed as single-threaded, but I could never see it being faster.

Do you know what the difference is between requesting to write files from two threads is as opposed to requesting to write a file from one thread?

(the actual process that is happening after you send a request to write a file)

actually there is no difference, which is why you guys are wrong.

if anything the difference is, with multi threading you add 2 requests to the queue one after the other (potentially) and the first request is fulfilled then the second request is filled.

in single threading, you add one request to the queue, it is fulfilled, then you add the next request to the queue and it is fulfilled.

if externals application are hogging the hdd, then there should really be no speed difference (if they are, multithreaded will be faster, depending on what other applications are doing)

so in an ideal environment where only your application is using the hdd, in terms of file writing speed, there is no ~noticeable difference

the actual speed comes from other things not directly related to writing files (for example, things like creating FileOutputStream objects can be done concurrently)

this is the truth, this was supported by my benchmarks

i don’t see why you keep holding on to the idea that when you multithread file writing, the disk is writing each file at the same time and therefore seeking a lot… that’s not true. you only seek once per file

[quote=“Eli Delventhal,post:25,topic:37188”]

Hihi, HDDs have a constant speed. If you seek back, you simply have to wait for slightly less than a single revolution.

@counterp
You’re hilariously misinformed. Why don’t you read up on how harddisks actually work? Lots of interesting stuff.

If you all think the operating system translates your InputStream reads directly into disk seeks, you need to brush up on the evolution of operating systems from the DOS days. Modern operating systems have buffer caches and IO schedulers that go all the way down to understanding disk geometry.

The main problem with using threads willy nilly is the amount of memory they take up, since they each have to haul a stack along with them. All the other claims being thrown around are meaningless unless they can be backed up with actual data. Benchmarks aren’t perfect, but I’ll take them over anecdote any day.

Besides, if you want performance, you really should be using nio, and that’s an async design that doesn’t lend itself to thread-per-stream anyway.

[quote=“Riven,post:27,topic:37188”]

Yes, I will stop talking now, because I’m getting caught in the trap of saying things I don’t fully understand.

Gotta stop doing that.

rofl, point out the part where I was wrong, please.

If you think the creation of objects is even relevant in file I/O, it really gets laughable.

If you’re reading/writing concurrently, you have to seek at every context-switch (thread switch).

To ensure ‘worst case secenario’, the ‘concurrent’ file I/O is done from a single thread. This is all the same for the harddisk, it just makes sure every write is done to a different file.


   public static void main(String[] args) throws IOException
   {
      int writeSize = 4 * 1024;
      int totalSize = 512 * 1024 * 1024;

      for (int fileCount = 1; fileCount <= 8; fileCount++)
      {
         FileOutputStream[] fos = new FileOutputStream[fileCount];
         for (int i = 0; i < fos.length; i++)
         {
            fos[i] = new FileOutputStream(new File("C:/test." + i + ".tmp"));
         }

         long t0 = System.currentTimeMillis();
         writeInterleavedStreams(fos, totalSize / writeSize, writeSize);
         long t1 = System.currentTimeMillis();

         int size = (totalSize / 1024 / 1024);
         int sec = (int) ((t1 - t0) / 1000);
         System.out.println("wrote " + fos.length + " files of " + size + "MB in " + sec + " sec, total throughput: " + (size / sec * fileCount) + "MB/sec");
      }
      System.out.println();
   }

   public static void writeInterleavedStreams(OutputStream[] streams, int writeCount, int writeSize) throws IOException
   {
      byte[] buf = new byte[writeSize];

      for (int i = 0; i < writeCount; i++)
      {
         for (OutputStream stream : streams)
         {
            stream.write(buf);
            stream.flush();
         }
      }

      for (OutputStream stream : streams)
      {
         stream.close();
      }
   }


wrote 1 files of 512MB in 3 sec, total throughput: 170MB/sec <--- bad-ass
wrote 2 files of 512MB in 32 sec, total throughput: 32MB/sec <--- once you concurrently write more than 1 file, it's all lost
wrote 3 files of 512MB in 45 sec, total throughput: 33MB/sec
wrote 4 files of 512MB in 78 sec, total throughput: 24MB/sec
wrote 5 files of 512MB in 116 sec, total throughput: 20MB/sec
wrote 6 files of 512MB in 166 sec, total throughput: 18MB/sec
wrote 7 files of 512MB in 182 sec, total throughput: 14MB/sec
wrote 8 files of 512MB in 224 sec, total throughput: 16MB/sec

I do believe counterp just got served. 8)

My fancy pants 256gb SSD (probably shaved some life off it :p):

The 3 files was consistently faster like the above.

Here’s my 750gb HDD:

I got bored and stopped it. Then I ran it again:

Again I got bored and stopped it. Did you ruin my HDD controller?!

Riven, try concurrently writing on the same file. It should be faster than writing serially on 1 file. It triggers NCQ. Ok, please test with NCQ enabled, i.e. AHCI. Also, Java’s IO is written to work concurrently to some extent, so the said context switching won’t occur if it’s the same file.

Yes. Obviously harddrives aren’t meant to have data written to them. … Isn’t that just the OS caching the data in RAM and returning instantly?

Sorry, I feel the need to sit on the fence yet again - a PITA for you guys and for me! ;D

Isn’t what Riven and counterp are saying both right?

Riven is right about the inherently serial nature of the disk write, but as others have pointed out, in most modern OS’s / file systems isn’t the FileOutputStream actually writing to RAM to be later flushed to disk by the filesystem, and therefore the implication that each thread context shift results in a disk seek is not correct???

Oh, and the effect of the above will also be that benchmarks will be quite different across different file systems.

Hehe, this is getting kind of ridiculous. He’s making a Minecraft server for god’s sake. Single threaded IO should be better as it’s easier to use.

Oh, was there an OP? :slight_smile:

No, the RAM will be sync-ed with the storage device, in a blocking operation.

This applies for:
`- FileOutputStream.flush()

  • RandomAccessFile.getFD().sync();
  • FileChannel.force(boolean)`

If we didn’t have such a guarantee, lots of critical applications (like databases) couldn’t restore to a ‘known state’ after a crash.