As SATA drives have gotten larger, the chance of a minor error creeping in during a RAID rebuild has greatly increased. For the new 2-4 terabyte models, assuming a rebuild rate of 100 MB/s, a mirror rebuild can take 5-10 hours. The situation is even worse for RAID 5 and RAID 6 arrays where you have to update multiple discs and rebuild times tend to scale out with the total size of the array.
One low-level solution I have been using is to partition the drive into smaller segments (usually 1/4 to 1/2 of the drive capacity), use mdadm Software RAID across each segment, then put all the segments into a single LVM Volume Group (VG). The advantage is that it's simple and often only a single segment of the drive has to be re-sync'd from scratch if there is a power outage or other glitch during the rebuild.
The other (probably better) solution is to use the mdadm --bitmap option (kernel.org link). This allows the array to keep track of which blocks are dirty (not yet sync'd to all disks) or clean. It speeds up resync operations greatly if there is a power failure or glitch during the write operation. The main disadvantage is that you are looking at three write operations whenever you change a bit of data. First, mdadm has to mark the bit relating to that section of the disk as dirty. Second, it writes out the data. Third, it has to go back and mark the bit as clean. This can severely impact performance.
By default, when using internal bitmaps, mdadm splits the disk into as many chunks as possible given the small size of the bitmap area. For smaller partitions, this can be as small as 4MiB, but you can also specify larger values with the "--bitmap-chunks=NNNN" argument. For larger drives, you will want to consider chunk sizes of at least 16-128MiB.
Warnings:
- I've run into a situation where my version of mdadm (v2.6.9 - 10th March 2009, Linux version 2.6.18-194.32.1.el5) would cause the machine to lock up hard when removing a bitmap. Another machine has a newer CentOS5 kernel (2.6.18-308.16.1.el5xen) and experienced no issues. So make sure you are running a fairly recent kernel.
Instructions:
In order to add bitmaps to an existing Software RAID array, the array must be "clean". The command is simply:
# mdadm --grow --bitmap=internal --bitmap-chunks=32768
If you want to resize the bitmap chunk, you must first remove the existing bitmap:
# mdadm --grow --bitmap=none
Performance:
I did some testing on a system I had access to which had a 7 drive RAID-10 array (6 active spindles, 1 spare) using 7200 RPM 500GB SATA drives. Values are in KB/sec using bonnie++ as the test program (15GB test size).
#1 No bitmap:
Seq Write: 139035
Seq ReWrite: 43732
Seq Read: 76221
#2 bitmap size of 4096KiB
Seq Write: 109720 (27% lower)
Seq ReWrite: 40179
Seq Read: 72917
#3 bitmap size of 16384KiB
Seq Write: 127924 (8.7% lower)
Seq ReWrite: 40734
Seq Read: 73870
#4 bitmap size of 65536KiB
Seq Write: 124694 (12% lower)
Seq ReWrite: 40674
Seq Read: 74501
As can be seen, the larger chunk sizes do not impact sequential write performance as much.
2 comments:
Thanks for testing with larger bitmap-chunk sizes. I've gone with 256MB on a 2x3TB RAID-1 array.
But the difference is that I'm using an external bitmap. All hard drives are slow and so I've got RAID-1 SSDs for the system drives (XEN VMs).
I can't test the perfomance with external bitmap yet since the process of moving the data from the old array to the new one takes a long time, but perhaps you could run some tests like that.
I imagine the performance hit with external bitmap on SSD would be minimal, if not negligible.
in case of someone hit this nowadays, the command should be --bitmap-chunk= not chunks.
Thanks for useful post :)
Post a Comment