Measuring disk io performance

pepijnve · July 13, 2006, 2:25pm

I’m experimenting with the impact of file layout on disk read performance and have achieved some good results, but I can’t pinpoint why one alternative is better than the other.
My data structure consists of a rectangular grid of tiles. Each tile has multiple resolution versions (1024^2, 512^2, 256^2, …). I’ve produced a test dataset of 64x64 tiles with a highest level resolution of 1024^2. This dataset has then been written to disk in two different formats A and B.

File format A creates one file per resolution level. In each file all tiles are written linearized by row.
File 1: Tile1@1024^2, Tile2@1024^2, Tile3@1024^2, …
File 2: Tile1@512^2, Tile2@512^2, Tile3@512^2, …
…

File format B creates multiple files with some aribtrary size limit for each file. Each file contains some of the tiles including all of their levels.
File 1: Tile1@1024^2, Tile1@512^2, …, Tile1@1^2, Tile2@1024^2, Tile2@512^2, …, Tile2@1^2, …
File 2: Tile10@1024^2, Tile10@512^2, …, Tile10@1^2, Tile11@1024^2, Tile11@512^2, …, Tile11@1^2, …
…

In practice this gives me for format A a 16Gb file for the first level, a 4Gb file for the next, … Format B has 64 files of 340Mb each. To see which format performs best I wrote the following bit of benchmark code:

public void readTiles(FileFormat format, byte[][] buffer, int levelCount) throws IOException {
      for (int y = 0; y < GRID_HEIGHT; y++) {
        for (int x = 0; x < GRID_WIDTH; x++) {
          for (int level = 0; level < levelCount; level++) {
            int fileIndex = format.getFileForTile(x, y, level);
            long offset = format.getOffsetForTile(x, y, level);
            RandomAccessFile file = getFile(fileIndex);

            file.seek(offset);

            int bytesRead = 0;
            int bytesToRead = buffer[level].length;
            while (bytesToRead > 0) {
              int r = file.read(buffer[level], bytesRead, bytesToRead);
              if (r == -1) {
                throw new EOFException();
              } else {
                bytesToRead -= r;
                bytesRead += r;
              }
            }
          }
        }
      }
    }

After running this test with the two formats I get the following results:
Format A: ~554ms per tile
Format B: ~106ms per tile

Of course the above results are good news, but I can’t seem to figure out why one is so much faster than the other. My intuition tells me this is related to data locality and seek times, but I would like to get some slightly more scientific proof than just my intuition. Profiling with a java profiler shows the read call is the bottleneck in both cases, so that’s not of much use. Does anyone know how I can determine what’s actually going on behind the read call so I can figure out exactly why one is slower than the other?

Riven · July 13, 2006, 3:37pm

rreyelts · July 13, 2006, 4:38pm

I don’t exactly understand how your tiles correspond to your grid height, grid width, and level, but my guess is that you’re doing a lot less physical seeking in your second version. A full seek is gonna cost you a lot (10ms maybe?), so you don’t want to be performing hundreds of them.

My suggestion is to organize your data on disk so as to minimize your seeks during load time… You can transfer a heck of a lot of data very fast, if you’re just doing block i/o. It’s the same principles as file fragmentation. Fragmented files can really screw you up badly, because the disk will have to do a lot more seeking for what should have been just block transfers.

pepijnve · July 13, 2006, 5:06pm

Ah seems like I traded too much clarity for conciseness.
The data that I’m working with is a gigantic image (for instance a satellite image of the entire earth). This image is chopped into tiles of say 1024x1024. This results in a rectangular grid of tiles. GRID_WIDTH and GRID_HEIGHT are the dimensions of this grid.
This grid is linearized row by row and then written into the files. So what the benchmark code basically does is

Iterate over each row
- Iterate over each column
  - Iterate over all levels of a tile
    - Read the level
      For format A the inner loop bascially consists of a read in x distinct files.
      For format B this is x reads in a single file and all the reads are in consecutive parts of that file.

I’ve ensured the data files are not fragmented at all (clean empty disk, single copy operation, checked with defragger) to make sure this doesn’t influence my test results. My intuition is also that the physical seek is what kills the performance for format A, but I would like to be able to measure this. In other words I would like to get a breakdown of the read operations showing %seeking and %actual reading. Don’t know if this is possible in the first place though…

rreyelts · July 13, 2006, 6:29pm

If you’re reading from different files, you’ll definitely induce a nice seek from each different file you read from. (Pretty hardware with command queueing and what not can alleviate this). For some caching software I wrote, I actually considered writing a queueing system so that I/O requests physically near each other would be processed together.

[quote]but I would like to be able to measure this. In other words I would like to get a breakdown of the read operations showing %seeking and %actual reading. Don’t know if this is possible in the first place though…
[/quote]
If you’re running under Windows, you can use perfmon. This section from the doc seems relevant:

“You can also check the value of Avg. Disk Bytes/Transfer. A value greater than 20 KB indicates that the disk drive is generally performing well; low values result if an application is accessing a disk inefficiently. For example, applications that access a disk at random raise Avg. Disk sec/Transfer times because random transfers require increased seek time.”

pepijnve · July 13, 2006, 6:42pm

Aha, that sounds just like what I was looking for. I’ll give that a go tomorrow. Thank you very much.
Which document contains that information btw. I already glanced through the perfmon help file, but didn’t find anything there. Might have overlooked it of course…

rreyelts · July 13, 2006, 7:12pm

It was in the perfmon help.

pepijnve · July 14, 2006, 6:58am

If you’re interested here are my results:

[tr][td][/td][td]Format A[/td][td]Format B[/td][/tr]
[tr][td]Avg sec/read[/td][td]26ms[/td][td]9ms[/td][/tr]
[tr][td]Avg bytes/read[/td][td]63kb[/td][td]56kb[/td][/tr]
[tr][td]Avg bytes/sec[/td][td]7.2Mb/s[/td][td]9.1Mb/s[/td][/tr]

These basically confirm my assumptions. I’m not sure how to interpret the avg bytes/read value though. What does this actually mean?

Oops. I just noticed that I was copying my data files over the network to another machine. That’s not the best way to get correct results now is it : I’ll updated these numbers later for correctness, but I expect the difference between the two formats to stay the same…