Eu tenho um total de 4TB RAID5 no total de 4TB em um servidor Ubuntu 14.04.3 LTS.
Desde um pânico do kernel causado por um dispositivo já substituído não relacionado à matriz, após cada reinicialização, a matriz é iniciada [UU_]. A solução temporária que encontrei foi executar mdadm --add /dev/md0 /dev/sdd1
porque ele começa a reconstruir e é reconstruído com êxito.
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : active raid5 sdd1[4] sdb1[3] sdc1[1]
3906763776 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]
unused devices: <none>
Mas tenho que fazer isso a cada reinicialização e notei que os números do disco parecem errados: 4, 3 e 1 em vez de 2, 1 e 0.
root@Bt-Networks-Server:~# mdadm --detail /dev/md0
/dev/md0:
Version : 1.2
Creation Time : Fri Aug 1 00:53:53 2014
Raid Level : raid5
Array Size : 3906763776 (3725.78 GiB 4000.53 GB)
Used Dev Size : 1953381888 (1862.89 GiB 2000.26 GB)
Raid Devices : 3
Total Devices : 3
Persistence : Superblock is persistent
Update Time : Mon Oct 26 17:40:43 2015
State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 512K
Name : Bt-Networks-Server:0 (local to host Bt-Networks-Server)
UUID : 4e860a9e:0b433a00:54d2c991:78ca3d15
Events : 83137
Number Major Minor RaidDevice State
4 8 49 0 active sync /dev/sdd1
1 8 33 1 active sync /dev/sdc1
3 8 17 2 active sync /dev/sdb1
Também encontrei as seguintes informações sobre o dmesg sobre o kick de discos não-recentes:
[ 2.430966] md: raid1 personality registered for level 1
[ 2.500019] raid6: sse2x1 3900 MB/s
[ 2.568110] raid6: sse2x2 4957 MB/s
[ 2.582322] md: bind<sdc1>
[ 2.583992] md: bind<sdd1>
[ 2.608030] usb 6-2: new low-speed USB device number 2 using uhci_hcd
[ 2.619248] md: bind<sdb1>
[ 2.620098] md: kicking non-fresh sdd1 from array!
[ 2.620103] md: unbind<sdd1>
[ 2.636013] raid6: sse2x4 6926 MB/s
[ 2.636015] raid6: using algorithm sse2x4 (6926 MB/s)
[ 2.636017] raid6: using ssse3x2 recovery algorithm
[ 2.637624] xor: measuring software checksum speed
[ 2.664021] usb 7-1: new low-speed USB device number 2 using uhci_hcd
[ 2.676012] prefetch64-sse: 10026.000 MB/sec
[ 2.716011] generic_sse: 8868.000 MB/sec
[ 2.716013] xor: using function: prefetch64-sse (10026.000 MB/sec)
[ 2.717321] async_tx: api initialized (async)
[ 2.725129] md: raid6 personality registered for level 6
[ 2.725131] md: raid5 personality registered for level 5
[ 2.725133] md: raid4 personality registered for level 4
[ 2.728509] md: export_rdev(sdd1)
[ 2.729556] md/raid:md0: device sdb1 operational as raid disk 2
[ 2.729559] md/raid:md0: device sdc1 operational as raid disk 1
[ 2.729927] md/raid:md0: allocated 0kB
[ 2.729976] md/raid:md0: raid level 5 active with 2 out of 3 devices, algorithm 2
[ 2.729983] RAID conf printout:
[ 2.729984] --- level:5 rd:3 wd:2
[ 2.729986] disk 1, o:1, dev:sdc1
[ 2.729988] disk 2, o:1, dev:sdb1
[ 2.730030] md0: detected capacity change from 0 to 4000526106624
[ 2.731863] md: raid10 personality registered for level 10
[ 2.755618] md0: unknown partition table
[ 2.812332] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null)
Eu já verifiquei o mdadm.conf atualizando:
root@Bt-Networks-Server:~# mdadm --detail --scan
ARRAY /dev/md/0 metadata=1.2 name=Bt-Networks-Server:0 UUID=4e860a9e:0b433a00:54d2c991:78ca3d15
salvando no arquivo de configuração e executando update-initramfs -u
Existe alguma solução para evitar a adição e reconstrução / ressincronização do array a cada reinicialização?
Obrigado!
EDITAR:
Conteúdo do /etc/mdadm/mdadm.conf:
root@Bt-Networks-Server:~# cat /etc/mdadm/mdadm.conf
# mdadm.conf
#
# Please refer to mdadm.conf(5) for information about this file.
#
# by default (built-in), scan all partitions (/proc/partitions) and all
# containers for MD superblocks. alternatively, specify devices to scan, using
# wildcards if desired.
#DEVICE partitions containers
# auto-create devices with Debian standard permissions
CREATE owner=root group=disk mode=0660 auto=yes
# automatically tag new arrays as belonging to the local system
HOMEHOST <system>
# instruct the monitoring daemon where to send mail alerts
MAILADDR root
# definitions of existing MD arrays
# This file was auto-generated on Thu, 31 Jul 2014 23:42:00 -0300
# by mkconf $Id$
#ARRAY /dev/md/Bt-Networks-Server:0 metadata=1.2 name=Bt-Networks-Server:0 UUID=4e860a9e:0b433a00:54d2c991:78ca3d15
ARRAY /dev/md/0 metadata=1.2 UUID=4e860a9e:0b433a00:54d2c991:78ca3d15 name=Bt-Networks-Server:0
Pesquisado por meio do dmesg e encontrado log relacionado à recuperação
[ 185.105099] md: export_rdev(sdd1)
[ 185.220543] md: bind<sdd1>
[ 185.320114] RAID conf printout:
[ 185.320118] --- level:5 rd:3 wd:2
[ 185.320121] disk 0, o:1, dev:sdd1
[ 185.320123] disk 1, o:1, dev:sdc1
[ 185.320124] disk 2, o:1, dev:sdb1
[ 185.320272] md: recovery of RAID array md0
[ 185.320276] md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
[ 185.320278] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[ 185.320281] md: using 128k window, over a total of 1953381888k.
[ 1009.812057] EXT4-fs (md0): recovery complete
[ 1009.896520] EXT4-fs (md0): mounted filesystem with ordered data mode. Opts: (null)
[ 1109.136229] perf interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
[19295.440128] md: md0: recovery done.
[19295.607089] RAID conf printout:
[19295.607096] --- level:5 rd:3 wd:3
[19295.607099] disk 0, o:1, dev:sdd1
[19295.607101] disk 1, o:1, dev:sdc1
[19295.607103] disk 2, o:1, dev:sdb1
Também foram encontrados alguns dados sobre uma verificação de matriz periódica bem-sucedida
[501643.369779] md: data-check of RAID array md0
[501643.369784] md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
[501643.369786] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check.
[501643.369791] md: using 128k window, over a total of 1953381888k.
[518452.072029] md: md0: data-check done.