Well, I needed a way to watch the mdstat progress (because a disk just failed …).
1 2 3 |
#!/bin/bash watch -n1 cat /proc/mdstat |
Well, I needed a way to watch the mdstat progress (because a disk just failed …).
1 2 3 |
#!/bin/bash watch -n1 cat /proc/mdstat |
Well, I recently had to flatten my archive NAS (well only the OS part … *wheeeh*). Since I didn’t have the chance to backup the old settings I had to do everything from scratch … And this time I decided, I wasn’t doing a script but rather the proper way.
I spent a while reading through the Internetz about the various settings until I stumbled upon a Frauenhofer Wiki entry. From there I ended up writing those udev-rules and the sysctl configs…
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
# Settings from http://www.fhgfs.com/wiki/StorageServerTuning # Set an appropriate IO scheduler for file servers. KERNEL=="sd[a-z]", ATTR{queue/scheduler}="deadline" KERNEL=="sd[a-i][a-z]", ATTR{queue/scheduler}="deadline" # Give the IO scheduler more flexibility by increasing the number of # schedulable requests. KERNEL=="sd[a-z]", ATTR{queue/nr_requests}="4096" KERNEL=="sd[a-i][a-z]", ATTR{queue/nr_requests}="4096" # To improve throughput for sequential reads, increase the maximum amount of # read-ahead data. The actual amount of read-ahead is adaptive, so using a # high value here won't harm performance for small random access. KERNEL=="sd[a-z]", ATTR{queue/read_ahead_kb}="73728" KERNEL=="sd[a-i][a-z]", ATTR{queue/read_ahead_kb}="73728" KERNEL=="sd[a-z]", RUN+="/sbin/blockdev --setra 73728 /dev/%n" KERNEL=="sd[a-i][a-z]", RUN+="/sbin/blockdev --setra 73728 /dev/%n" SUBSYSTEM=="block", KERNEL=="md[0-9]*", RUN+="/sbin/blockdev --setra 663552 /dev/%n" SUBSYSTEM=="block", KERNEL=="md[0-9]*", ATTR{md/stripe_cache_size}="9216" # Optimal performance for hardware RAID systems often depends on large IOs # being sent to the device in a single large operation. Please refer to your # hardware storage vendor for the corresponding optimal size of # /sys/block/sdX/max_sectors_kb. # It is typically good if this size can be increased to at least match your # RAID stripe set size (i.e. chunk_size x number_of_disks): KERNEL=="sd[a-z]", ATTR{queue/max_sectors_kb}="512" KERNEL=="sd[a-i][a-z]", ATTR{queue/max_sectors_kb}="512" KERNEL=="sd[a-z]", ATTR{device/queue_depth}="1" KERNEL=="sd[a-i][a-z]", ATTR{device/queue_depth}="1" |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
# Settings taken from http://www.fhgfs.com/wiki/StorageServerTuning # To avoid long IO stalls (latencies) for write cache flushing in a production # environment with very different workloads, you will typically want to limit # the kernel dirty (write) cache size. vm.dirty_background_ratio = 5 vm.dirty_ratio = 10 # Assigning slightly higher priority to inode caching helps to avoid disk seeks # for inode loading vm.vfs_cache_pressure = 50 # Buffering of file system data requires frequent memory allocation. Raising the # amount of reserved kernel memory will enable faster and more reliable memory # allocation in critical situations. Raise the corresponding value to 64MB if # you have less than 8GB of memory, otherwise raise it to at least 256MB vm.min_free_kbytes = 262144 |
For now, I’m rather pleased with the results …
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
root:(charon.ka.heimdaheim.de) PWD:/ Wed Jul 09, 15:02:08 [0] > mdadm --detail /dev/md127 /dev/md127: Version : 1.2 Creation Time : Sat Jan 26 18:35:19 2013 Raid Level : raid5 Array Size : 15626121216 (14902.23 GiB 16001.15 GB) Used Dev Size : 1953265152 (1862.78 GiB 2000.14 GB) Raid Devices : 9 Total Devices : 10 Persistence : Superblock is persistent Update Time : Wed Jul 9 15:03:28 2014 State : clean Active Devices : 9 Working Devices : 10 Failed Devices : 0 Spare Devices : 1 Layout : left-symmetric Chunk Size : 512K Name : charon:aggr1 (local to host charon) UUID : 6d11820f:04847070:2725c434:9ee39718 Events : 11186 Number Major Minor RaidDevice State 0 8 129 0 active sync /dev/sdi1 1 8 33 1 active sync /dev/sdc1 2 8 49 2 active sync /dev/sdd1 4 8 65 3 active sync /dev/sde1 5 8 17 4 active sync /dev/sdb1 10 8 97 5 active sync /dev/sdg1 9 8 81 6 active sync /dev/sdf1 8 8 161 7 active sync /dev/sdk1 7 8 145 8 active sync /dev/sdj1 6 8 113 - spare /dev/sdh1 |
And here’s the dd output:
1 2 3 4 5 6 |
root:(charon.ka.heimdaheim.de) PWD:/ Wed Jul 09, 14:57:32 [0] > dd if=/dev/zero of=/srv/smb/tmp bs=1G count=100 \ oflag=direct 100+0 records in 100+0 records out 107374182400 bytes (107 GB) copied, 257.341 s, 417 MB/s |
Well, at last I’m getting somewhere with my troubles. This only seems to be happening when creating an RAID5 multiple device with four disks, this doesn’t happen with three.
Now, the next thing I tried was to create a three disk array, and then adding the fourth disk as spare and then extending the array with that fourth disk. After that, all these errors seem to appear again *yuck* So I either possess rather faulty disks, or something else is fishy, since I’m having another four disk RAID5 array with the old disks …