Review my thread pattern?

hmm … from the JavaDoc for flush()


         for (OutputStream stream : streams)
         {
            stream.write(buf);
            stream.flush();
            //((FileOutputStream) stream).getFD().sync();
            //((FileOutputStream) stream).getChannel().force(false);
         }

It seems you are right flush() does not block, while sync() and force(…) do.

Still, you can observe the dramatic performance degradation when doing concurrent file writes on HDDs.

It seems concurrently writing to files results in incredible fragmentation (making it hard to find new free clusters), which somehow remains the case after deletion of the files.

But you wouldn’t do these for every block of data you write. To use your example above, there are databases which don’t even bother to do this after every write. There are other ways of achieving data integrity, and for that matter these still don’t guarantee the data is actually written to the disk depending on the hardware.

I wasn’t disputing you were seeing this, just your comment about head-seek for every context-shift. It’d be interesting to see how this differs across filesystems and OS’s. For that matter, I assume you’ve tried the above benchmark with multiple threads as opposed to just multiple writes? I could understand why these two circumstances could be treated very differently in the underlying filesystem.

I don’t think anyone disputes that concurrent access harms throughput. The reasons why this are more likely to do with context switching overhead and buffer cache busting than any universal truths about physical properties of the HDD, since you can see the effect even on a SSD (though it seems to take more threads to do it).

Even putting aside I/O, there’s plenty of other reasons to avoid using threads unless you really need parallel execution on multiple cores (and most I/O doesn’t need it). Even then you should at least be using java.util.concurrent.

::slight_smile: The benchmark runs in a single thread and only uses a single 4K byte[].

Yes, I do! If you’d said “I don’t think anyone disputes that concurrent access harms throughput on some systems” maybe. I’ve no doubt of the results people have seen so far, but the whole thing is very dependent on the underlying OS and filesystem.

Eg. Riven’s benchmark on Linux with ext4

wrote 1 files of 512MB in 9 sec, total throughput: 56MB/sec wrote 2 files of 512MB in 16 sec, total throughput: 64MB/sec wrote 3 files of 512MB in 29 sec, total throughput: 51MB/sec wrote 4 files of 512MB in 39 sec, total throughput: 52MB/sec wrote 5 files of 512MB in 48 sec, total throughput: 50MB/sec wrote 6 files of 512MB in 57 sec, total throughput: 48MB/sec wrote 7 files of 512MB in 69 sec, total throughput: 49MB/sec wrote 8 files of 512MB in 78 sec, total throughput: 48MB/sec

That seems fairly consistent to me. Though my laptop HD is so slow to start with, I’m not sure it’s a fair benchmark! ;D

Not sure exactly what you’re getting at here, but there are plenty of good reasons to use Threads. While I/O is not necessarily one of them, I’m not sure unnecessarily serializing (synchronizing) your threading model for I/O is always justified either.

… just off to delete all them tmp files before I forget they’re there and wonder why my disk has shrunk. ::slight_smile:

Sounds like you’re measuring burst latency then. The valuable lesson we learned in that case how badly HDD makers lie. :slight_smile:

Interesting stuff, didn’t realize it could be simply NTFS being crappy at concurrent access.

Then again, maybe EXT4 is just always slow :slight_smile:

Can we assume FileOutputStream.close() sync’s to the storage device?

If so, then my benchmark still stands :persecutioncomplex:

ext4 can be pretty zippy but it depends on how you tune it. Out of the box settings tend to be paranoid and use synchronous journaling, and on a laptop it’s probably a good idea to keep it that way.

Talking to a guy now who’s working on a supercomputer with 6TB of RAM. I could probably avoid hitting the disk on that baby :slight_smile:

I wouldn’t assume that. I don’t think Java would force something like that, as it’s an OS feature. Most of the time with hard drive access you aren’t interested in the data actually being written to the hard drive immediately, only that it will be written eventually. If close() forced a cache flush, it would negate everything gained from actually having the cache in the first place.

Here’s the results from running this on an HFS journaled (Mac OS X) laptop.


wrote 1 files of 512MB in 9 sec, total throughput: 56MB/sec
wrote 2 files of 512MB in 31 sec, total throughput: 32MB/sec
wrote 3 files of 512MB in 36 sec, total throughput: 42MB/sec
wrote 4 files of 512MB in 49 sec, total throughput: 40MB/sec
wrote 5 files of 512MB in 64 sec, total throughput: 40MB/sec
wrote 6 files of 512MB in 97 sec, total throughput: 30MB/sec
wrote 7 files of 512MB in 146 sec, total throughput: 21MB/sec
wrote 8 files of 512MB in 150 sec, total throughput: 24MB/sec

Well, I did mention modern OS’s earlier :stuck_out_tongue:

er … yeah, modern OS … consistently slow! :slight_smile:

I agree. In fact, I’d go as far as to say it’s safe to assume it doesn’t. I’ve found something definitive on Android, but not on Java yet. However, I assume that’s the point in having the FD.sync() method in the first place. Maybe add that into the benchmark? And threads too - interested to know if there’s any threadlocal stuff affecting caching (I’m too busy lazy to write myself atm).

@Eli - phew, not just me that slow then - I was getting disk envy :persecutioncomplex:

I think it’s pretty easy to rule out the OS caching data: writing so much that the OS simply can’t cache it anymore.

If you write a file that is roughly 4 times the available RAM, and it has the same performance as writing a file about as big as the available RAM, you pretty much know (almost) everything is written to disk when the stream is closed (within a small margin of error).


   public static void main(String[] args) throws IOException
   {
      int writeSize = 4 * 1024;
      long minTotalSize = 1L * 1024 * 1024 * 1024;
      long maxTotalSize = 8L * 1024 * 1024 * 1024;

      int p = 0;

      for (long totalSize = minTotalSize; totalSize <= maxTotalSize; totalSize *= 2)
      {
         int fileCount = 1;
         FileOutputStream[] fos = new FileOutputStream[fileCount];
         for (int i = 0; i < fos.length; i++)
         {
            File file = new File("C:/test." + (++p) + ".tmp");
            file.delete();
            fos[i] = new FileOutputStream(file);
         }

         long t0 = System.currentTimeMillis();
         writeInterleavedStreams(fos, (int) (totalSize / writeSize), writeSize);
         long t1 = System.currentTimeMillis();

         long size = (totalSize / 1024 / 1024);
         long sec = ((t1 - t0) / 1000);
         System.out.println("wrote " + fos.length + " files of " + size + "MB in " + sec + " sec, total throughput: " + (size / sec * fileCount) + "MB/sec");
      }
   }

   public static void writeInterleavedStreams(OutputStream[] streams, int writeCount, int writeSize) throws IOException
   {
      byte[] buf = new byte[writeSize];

      for (int i = 0; i < writeCount; i++)
      {
         for (OutputStream stream : streams)
         {
            stream.write(buf);
         }
      }

      for (OutputStream stream : streams)
      {
         stream.close();
      }
   }

2GB RAM free, on a 4GB system:


wrote 1 files of 1024MB in 6 sec, total throughput: 170MB/sec
wrote 1 files of 2048MB in 13 sec, total throughput: 157MB/sec
wrote 1 files of 4096MB in 24 sec, total throughput: 170MB/sec
wrote 1 files of 8192MB in 50 sec, total throughput: 163MB/sec

That pretty much rules out OS caching (affecting the benchmark) to me.

If you want to force fsync, you have to call the outputstream’s getFD().sync() . Closing a stream only flushes it, which means the data is no longer the application’s problem, but the OS and filesystem is free to buffer it as long as it feels like.

fsync() itself is only a very strong suggestion though, and HDD manufacturers often still cache things on the onboard RAM on the drive, which won’t survive a power failure.

At some point you have to ensure the data is written to persistent storage, like prior to system shutdown / reboot. Same applies for ‘safe removal’ of external devices. Obviously this is done at the OS level, but I’d highly doubt we can’t access that functionality somehow (without actually shutting down the device).

You can force fsync on individual fd’s, but I don’t think java exposes the sync syscall. Other than calling Runtime.getRuntime.exec("/bin/sync") and being SOL on Windows, that is…

Riven your benchmark is nice and all… but writing files sequentially at that size and amount on the same thread will also quickly slow down the harddrive at the SAME RATE… I don’t get what point your trying to make…

It doesn’t change the fact that single threaded is still slower than multi threaded lol…

You’re also forgetting that 1) max file size is 2MB, and 2) minecraft will not be saving these files that often (you would have to be writing thousands at the same time or sequentially to get that kind of slowdown)

And I said object creation was only an example of one of the factors that can be done concurrently (parallel) on different threads. (creating a FileOutputStream object is actually really slow, almost an entire millisecond, although most of that is probably the actual opening of the file, there are still bulky operations like checking permissions)

What you’re lacking is a benchmark relative to the argument we’re having; that is, one comparing multithreaded vs singlethreaded writing (which would clearly prove I’m right, I’ve posted one, you should post your own too, just to make sure that my results aren’t biased or anything)

(either ncq or queue depth are the slowing factor, java blocks until the OS can queue IO requests. I’m going to take a wild guess and say nsigma doesn’t have ncq (enabled))

EDIT: consider this

	public static void main(String[] args) throws IOException {
		long start;
		int i = 0;
		long avg = 0;
		FileOutputStream out = new FileOutputStream(new File("large"));
		while (true) {
			start = System.nanoTime();

			out.write(new byte[1024 * 1024]);
			out.flush();

			long now = System.nanoTime();
			avg += (now - start) / 1000000;
			System.out.println("Average delay = " + (avg / ++i));
		}
	}

make sure you delete the large file (in your root project directory) afterwards, you should stop the program after you see the delay go up, if you wanna see throughput it’s only a small modification to the above

Why hello :stuck_out_tongue:

Quite a conversation I started…haha