Hi all, I have been recently involved in benchmarking two different device drivers of FreeBSD ,namely SDHCI and the SDIO driver to compare their relative performance under different circumstances. Here, I will summarize my results, possible conclusions and the benchmarking procedure.
Before moving towards benchmarking, i’d prefer going through this article. It really good and provides a great overview of how data is transferred to disk, different APIs available. This will surely help you selecting options for the benchmark.
I initially experimented with different benchmarks like iorate, iozone, fio, bonnie++ etc but the one i really liked was IOzone. It provides multitude of features and more importantly, iozone results are fairly consistent with multiple run and in agreement with diskinfo results. It has a flexible license, thus making it ideal for use with any open-source/proprietary application.
IOzone is a filesystem benchmark tool. The benchmark tests file I/O performance for the following operations: Read, write, re-read, re-write, read backwards, read strided, fread, fwrite, random read, pread ,mmap, aio_read, aio_write. It has builds available for: AIX, BSDI, HP-UX, IRIX, FreeBSD, Linux, OpenBSD, NetBSD, OSFV3, OSFV4, OSFV5, SCO OpenServer, Solaris, MAC OS X, Windows (95/98/Me/NT/2K/XP) so, This tutorial is relatively independent of the operating system you use.
IOzone can be build from its source or a pre-build binary might be available for your OS.
Building IOzone from it’s source
Source files are available at http://www.iozone.org/src/current/ . There are 4 main files:
- iozone.c – Main C file. IOzone is structured like a single big application and is not divided into a large number of different files.
- libasync.c – library for POSIX async read
- libbif.c – Responsible for writing the output in Excel format, so they can be directly imported into Excel.
IOzone CLI options
There are a number of options that can be used to configure the benchmark. I won’t be able to go through all of them, but i’ll cover all the options i used/ or are commonly used for benchmarking.
Command i used: Command used: iozone -e -I -a -s 100M -r 4k -r 512k -r 16M -R -i 0 -i 1 -i 2 -f /dev/sdda0s1
-I : In order to benchmark block device correctly , one need to disable the cache. -I option enables direct read/write via DMA to the device, bypassing the buffer. It’s similar to using O_DIRECT flag on linux. It bypasses the kernel’s page cache. Just have a look at it’s implementation:
case 'I': /* Use VXFS direct advisory or O_DIRECT from Linux or AIX , or O_DIRECTIO for TRU64 or Solaris directio */
sprintf(splash[splash_line++],"\tVxFS advanced feature SET_CACHE, VX_DIRECT enabled\n");
#if ! defined(DONT_HAVE_O_DIRECT)
#if defined(linux) || defined(__AIX__) || defined(IRIX) || defined(IRIX64) || defined(Windows) || defined(__FreeBSD__) || defined(solaris) || defined(IOZ_macosx)
sprintf(splash[splash_line++],"\tO_DIRECT feature enabled\n");
sprintf(splash[splash_line++],"\tO_DIRECTIO feature enabled\n");
sprintf(splash[splash_line++],"\tO_DIRECTIO feature not available in Windows version.\n");
It clearly states that this feature isn’t available on Windows. However, on FreeBSD it states that this feature will work, but as per my experience i prefer using -U option along with it, to ensure purging all the data in the cache.
-e : It includes flush(fsync/fflush) in timing calculations.
Following are the synchronous modes, as defined by POSIX:
- O_SYNC: File data and all file metadata are written synchronously to disk.
- O_DSYNC: Only file data and metadata needed to access the file data are written synchronously to disk.
- O_RSYNC: Not implemented
O_SYNC provides synchronized I/O file integrity completion, meaning write operations will flush data and all associated metadata to the underlying hardware. O_DSYNC provides synchronized I/O data integrity completion, meaning write operations will flush data to the underlying hardware, but will only flush metadata updates that are required to allow a subsequent read operation to complete successfully.
Thus, using fsync/fflush will guarantee that timing includes time required for writing file metadata as well.
-s #:- used to specify the size/amount of data to be transferred. It is recommended to use large data size as it guarantee consistent/average results.
-i #:- It is used to specify the tests to be performed.(0=write/rewrite, 1=read/re-read, 2=random-read/write 3=Read-backwards, 4=Re-write-record, 5=stride-read,6=fwrite/re-fwrite, 7=fread/Re-fread,8=random mix, 9=pwrite/Re-pwrite, 10=pread/Re-pread, 11=pwritev/Re-pwritev, 12=preadv/Re-preadv).
–a :- It enables automatic mode
-r :- Used to specify record length to be used for transferring data. N different record lengths implies N different tests each with specified record length will be performed. Filesystem IO occurs in smaller record size of 4, 16, 32 kb etc and thus, higher sizes like 16M record length doesn’t reveal much info about the filesystem.
-R :- Used to enable excel generated outputs
-f :- Used to point towards the file to which data has to be transferred. It is quite important to specify this correctly as write operations can destroy the data that is already there on the disk. In my case i have pointed it to my sd card’s filesystem.
–U :- This option is used to mount/unmount the disks filesystem. Mounting/unmounting purges the buffer cache associated with the block device. In FreeBSD, cache is maintained in a file only after device is mounted. Unlike in linux, where cache is maintained on the disk itself.
SD Card Preparation before benchmarking
Each time before running IOzone, it’s advisable to reformat the filesystem, to ensure same results every time the test is performed. On freebsd, this can be done by the following commands:
gpart destroy -F mmcsd0
gpart create -s BSD mmcsd0
gpart add -t freebsd-ufs mmcsd0
Where, mmcsd0 is my device i.e SD card. Moreover, while using -U option, it’s expected to have some configuration about the device mounting in /etc/fstab . Mounting configuration contains the mount point, filesystem type, privileges etc. For ex: /dev/mmcsd0a /mnt ufs rw,noauto /dev/mmcsd0a /mnt ufs rw,noauto
The -U has other limitations such as it doesn’t work at all in the test cases where the benchmark is running across a bunch of clients (distributed mode) as it doesn’t have any method to quiesce the load across the nodes for the -U to do it’s work.
Please note that Flash write tests are extremely sensitive to hidden move/rewrite and erase cycles performed by the flash translation layer. Also, Results might vary with different flash implementations, so always use the same sd card! and also of same size as seek time varies with size of sd card.
After preparing the SD card, just run iozone multiple times with same options to see if the results are constant or not. If constant then probably, you took all precautions carefully! and results are thus valid.
So, i performed various tests, with different objectives and here are the raw data and graphs: https://docs.google.com/spreadsheets/d/1_lf9S136z0tJyni9W3t1__Rlrkal1fd6L7-vnlmYQpc/edit?usp=sharing
Test#1 – To determine the effect of sd card’s filesystem on its performance
It can be easily comprehended from the graphs that filesystem of sd card doesn’t affect the read performance of sd card at all. However, Write speed is significantly affected by the filesystem , if in case of SDIO/MMCCAM driver.
Test#2 : Performance comparision of SDHCI and SDIO driver
Test#3 : Effect of caching on test results
Just look at the tremendous difference in sd card’s performance with disk cache on/off. Thus, it’s recommended to enable disk cache when using sd card for general application. However, for benchmarking purpose, just disable the caching.