Review my thread pattern?

nsigma · September 15, 2011, 2:20pm

hmm … from the JavaDoc for flush()

Riven · September 15, 2011, 2:38pm


         for (OutputStream stream : streams)
         {
            stream.write(buf);
            stream.flush();
            //((FileOutputStream) stream).getFD().sync();
            //((FileOutputStream) stream).getChannel().force(false);
         }

It seems you are right flush() does not block, while sync() and force(…) do.

Still, you can observe the dramatic performance degradation when doing concurrent file writes on HDDs.

Riven · September 15, 2011, 2:41pm

It seems concurrently writing to files results in incredible fragmentation (making it hard to find new free clusters), which somehow remains the case after deletion of the files.

nsigma · September 15, 2011, 3:11pm

But you wouldn’t do these for every block of data you write. To use your example above, there are databases which don’t even bother to do this after every write. There are other ways of achieving data integrity, and for that matter these still don’t guarantee the data is actually written to the disk depending on the hardware.

I wasn’t disputing you were seeing this, just your comment about head-seek for every context-shift. It’d be interesting to see how this differs across filesystems and OS’s. For that matter, I assume you’ve tried the above benchmark with multiple threads as opposed to just multiple writes? I could understand why these two circumstances could be treated very differently in the underlying filesystem.

sproingie · September 15, 2011, 4:07pm

I don’t think anyone disputes that concurrent access harms throughput. The reasons why this are more likely to do with context switching overhead and buffer cache busting than any universal truths about physical properties of the HDD, since you can see the effect even on a SSD (though it seems to take more threads to do it).

Even putting aside I/O, there’s plenty of other reasons to avoid using threads unless you really need parallel execution on multiple cores (and most I/O doesn’t need it). Even then you should at least be using java.util.concurrent.

Riven · September 15, 2011, 4:21pm

: The benchmark runs in a single thread and only uses a single 4K byte[].

nsigma · September 15, 2011, 4:57pm

Yes, I do! If you’d said “I don’t think anyone disputes that concurrent access harms throughput on some systems” maybe. I’ve no doubt of the results people have seen so far, but the whole thing is very dependent on the underlying OS and filesystem.

Eg. Riven’s benchmark on Linux with ext4

wrote 1 files of 512MB in 9 sec, total throughput: 56MB/sec wrote 2 files of 512MB in 16 sec, total throughput: 64MB/sec wrote 3 files of 512MB in 29 sec, total throughput: 51MB/sec wrote 4 files of 512MB in 39 sec, total throughput: 52MB/sec wrote 5 files of 512MB in 48 sec, total throughput: 50MB/sec wrote 6 files of 512MB in 57 sec, total throughput: 48MB/sec wrote 7 files of 512MB in 69 sec, total throughput: 49MB/sec wrote 8 files of 512MB in 78 sec, total throughput: 48MB/sec

That seems fairly consistent to me. Though my laptop HD is so slow to start with, I’m not sure it’s a fair benchmark! ;D

Not sure exactly what you’re getting at here, but there are plenty of good reasons to use Threads. While I/O is not necessarily one of them, I’m not sure unnecessarily serializing (synchronizing) your threading model for I/O is always justified either.

… just off to delete all them tmp files before I forget they’re there and wonder why my disk has shrunk. :

sproingie · September 15, 2011, 4:58pm

Sounds like you’re measuring burst latency then. The valuable lesson we learned in that case how badly HDD makers lie.

Riven · September 15, 2011, 5:01pm

Interesting stuff, didn’t realize it could be simply NTFS being crappy at concurrent access.

Then again, maybe EXT4 is just always slow

Riven · September 15, 2011, 5:03pm

Can we assume FileOutputStream.close() sync’s to the storage device?

If so, then my benchmark still stands :persecutioncomplex:

sproingie · September 15, 2011, 5:06pm

ext4 can be pretty zippy but it depends on how you tune it. Out of the box settings tend to be paranoid and use synchronous journaling, and on a laptop it’s probably a good idea to keep it that way.

Talking to a guy now who’s working on a supercomputer with 6TB of RAM. I could probably avoid hitting the disk on that baby

theagentd · September 15, 2011, 5:35pm

I wouldn’t assume that. I don’t think Java would force something like that, as it’s an OS feature. Most of the time with hard drive access you aren’t interested in the data actually being written to the hard drive immediately, only that it will be written eventually. If close() forced a cache flush, it would negate everything gained from actually having the cache in the first place.

Demonpants · September 15, 2011, 5:38pm

Here’s the results from running this on an HFS journaled (Mac OS X) laptop.


wrote 1 files of 512MB in 9 sec, total throughput: 56MB/sec
wrote 2 files of 512MB in 31 sec, total throughput: 32MB/sec
wrote 3 files of 512MB in 36 sec, total throughput: 42MB/sec
wrote 4 files of 512MB in 49 sec, total throughput: 40MB/sec
wrote 5 files of 512MB in 64 sec, total throughput: 40MB/sec
wrote 6 files of 512MB in 97 sec, total throughput: 30MB/sec
wrote 7 files of 512MB in 146 sec, total throughput: 21MB/sec
wrote 8 files of 512MB in 150 sec, total throughput: 24MB/sec

nsigma · September 15, 2011, 5:59pm

Well, I did mention modern OS’s earlier

er … yeah, modern OS … consistently slow!

I agree. In fact, I’d go as far as to say it’s safe to assume it doesn’t. I’ve found something definitive on Android, but not on Java yet. However, I assume that’s the point in having the FD.sync() method in the first place. Maybe add that into the benchmark? And threads too - interested to know if there’s any threadlocal stuff affecting caching (I’m too busy lazy to write myself atm).

@Eli - phew, not just me that slow then - I was getting disk envy :persecutioncomplex:

Riven · September 15, 2011, 6:02pm

I think it’s pretty easy to rule out the OS caching data: writing so much that the OS simply can’t cache it anymore.

If you write a file that is roughly 4 times the available RAM, and it has the same performance as writing a file about as big as the available RAM, you pretty much know (almost) everything is written to disk when the stream is closed (within a small margin of error).


   public static void main(String[] args) throws IOException
   {
      int writeSize = 4 * 1024;
      long minTotalSize = 1L * 1024 * 1024 * 1024;
      long maxTotalSize = 8L * 1024 * 1024 * 1024;

      int p = 0;

      for (long totalSize = minTotalSize; totalSize <= maxTotalSize; totalSize *= 2)
      {
         int fileCount = 1;
         FileOutputStream[] fos = new FileOutputStream[fileCount];
         for (int i = 0; i < fos.length; i++)
         {
            File file = new File("C:/test." + (++p) + ".tmp");
            file.delete();
            fos[i] = new FileOutputStream(file);
         }

         long t0 = System.currentTimeMillis();
         writeInterleavedStreams(fos, (int) (totalSize / writeSize), writeSize);
         long t1 = System.currentTimeMillis();

         long size = (totalSize / 1024 / 1024);
         long sec = ((t1 - t0) / 1000);
         System.out.println("wrote " + fos.length + " files of " + size + "MB in " + sec + " sec, total throughput: " + (size / sec * fileCount) + "MB/sec");
      }
   }

   public static void writeInterleavedStreams(OutputStream[] streams, int writeCount, int writeSize) throws IOException
   {
      byte[] buf = new byte[writeSize];

      for (int i = 0; i < writeCount; i++)
      {
         for (OutputStream stream : streams)
         {
            stream.write(buf);
         }
      }

      for (OutputStream stream : streams)
      {
         stream.close();
      }
   }

2GB RAM free, on a 4GB system:


wrote 1 files of 1024MB in 6 sec, total throughput: 170MB/sec
wrote 1 files of 2048MB in 13 sec, total throughput: 157MB/sec
wrote 1 files of 4096MB in 24 sec, total throughput: 170MB/sec
wrote 1 files of 8192MB in 50 sec, total throughput: 163MB/sec

That pretty much rules out OS caching (affecting the benchmark) to me.

sproingie · September 15, 2011, 8:58pm

If you want to force fsync, you have to call the outputstream’s getFD().sync() . Closing a stream only flushes it, which means the data is no longer the application’s problem, but the OS and filesystem is free to buffer it as long as it feels like.

fsync() itself is only a very strong suggestion though, and HDD manufacturers often still cache things on the onboard RAM on the drive, which won’t survive a power failure.

Riven · September 15, 2011, 9:04pm

At some point you have to ensure the data is written to persistent storage, like prior to system shutdown / reboot. Same applies for ‘safe removal’ of external devices. Obviously this is done at the OS level, but I’d highly doubt we can’t access that functionality somehow (without actually shutting down the device).

sproingie · September 15, 2011, 9:12pm

You can force fsync on individual fd’s, but I don’t think java exposes the sync syscall. Other than calling Runtime.getRuntime.exec("/bin/sync") and being SOL on Windows, that is…

counterp · September 16, 2011, 12:22am

Riven your benchmark is nice and all… but writing files sequentially at that size and amount on the same thread will also quickly slow down the harddrive at the SAME RATE… I don’t get what point your trying to make…

It doesn’t change the fact that single threaded is still slower than multi threaded lol…

You’re also forgetting that 1) max file size is 2MB, and 2) minecraft will not be saving these files that often (you would have to be writing thousands at the same time or sequentially to get that kind of slowdown)

And I said object creation was only an example of one of the factors that can be done concurrently (parallel) on different threads. (creating a FileOutputStream object is actually really slow, almost an entire millisecond, although most of that is probably the actual opening of the file, there are still bulky operations like checking permissions)

What you’re lacking is a benchmark relative to the argument we’re having; that is, one comparing multithreaded vs singlethreaded writing (which would clearly prove I’m right, I’ve posted one, you should post your own too, just to make sure that my results aren’t biased or anything)

(either ncq or queue depth are the slowing factor, java blocks until the OS can queue IO requests. I’m going to take a wild guess and say nsigma doesn’t have ncq (enabled))

EDIT: consider this

	public static void main(String[] args) throws IOException {
		long start;
		int i = 0;
		long avg = 0;
		FileOutputStream out = new FileOutputStream(new File("large"));
		while (true) {
			start = System.nanoTime();

			out.write(new byte[1024 * 1024]);
			out.flush();

			long now = System.nanoTime();
			avg += (now - start) / 1000000;
			System.out.println("Average delay = " + (avg / ++i));
		}
	}

make sure you delete the large file (in your root project directory) afterwards, you should stop the program after you see the delay go up, if you wanna see throughput it’s only a small modification to the above

Conner · September 16, 2011, 2:30am

Why hello

Quite a conversation I started…haha