To estimate expected processing time, we use seeker to measure the average access time on this drive array (19.26 ms) and dd to measure raw throughput (440.7 MiB/s), yielding an estimate of 3948.7 s. Any difference between this estimate and the actual may be the result of file system overhead and overhead of the script interpreter (bash).
(average file size / throughput + average access time) x number of files =
(2.65 MiB / 440.7 MiB/s + 19.26 ms) x 156242 = 3948.7 s
To gather the LBA's we use hdparm but if heavily relying on extent ordering one should write a custom program using FS_IOC_FIEMAP ioctl() calls to minimize invocations. (I'm storing the information in a metadata DB to order subsequent accesses to these files. Worse case, if the extent information becomes out of date, is a sub-optimal ordering. Also, the extents can be updated as a background process during low load.) Using hdparm in a script required 939 s to gather the extents but some of that originates from the invocation of hdparm for each file. Over 99.2% of the files have a single extent, so ordering by the first LBA should be nearly optimal.
Number of extents | Frequency |
1 | 155001 |
2 | 1161 |
3 | 78 |
4 | 2 |
The table below shows the averages of three tests of each file ordering. Even after adding in the cost of gathering the extents, there is still a great advantage to ordering by extents instead of processing files randomly. In this test, "processing" is actually cat'ing the file to /dev/null so the time is almost entirely the result of file operations.
Ordering | Time (s) | Speedup | |
avg | stdev | ||
Random | 9343.6 | 12.4 | 1.0 |
Extent (first LBA) | 1519.6 | 1.5 | 6.1 |
Unfortunately, this method requires directly attached disk and is of little use when the files are served by a network file system. Ideally the ordering would be handled by the file system after exposing pending file operations. Sounds like an interesting project for a kernel hacker.
(All tests performed under Fedora 13, kernel 2.6.33.6-147.2.4.fc13.x86_64.)