Eu tive uma configuração RAID1 em execução na minha máquina por alguns anos e, recentemente, a matriz se degradou. Olhando para as informações do mdadm, parece que a unidade falhou, mas quando vejo as informações da SMART, a unidade outra teve erros. Não tenho certeza em quem confiar.
Se eu estiver lendo a saída de sudo mdadm --detail /dev/md0
corretamente, /dev/sda1
falhou e /dev/sdb1
ainda está na matriz e pode ser confiável.
/dev/md0:
Version : 1.2
Creation Time : Sat Jan 5 01:18:40 2013
Raid Level : raid1
Array Size : 2930133824 (2794.39 GiB 3000.46 GB)
Used Dev Size : 2930133824 (2794.39 GiB 3000.46 GB)
Raid Devices : 2
Total Devices : 1
Persistence : Superblock is persistent
Update Time : Thu Aug 6 20:33:11 2015
State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0
Name : storm:0 (local to host storm)
UUID : 98b434f9:54d5c413:1acc4033:8ad34365
Events : 8388
Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 17 1 active sync /dev/sdb1
No entanto, depois de executar um autoteste SMART curto em ambas as unidades, /dev/sda
não tem problemas, mas /dev/sdb
está mostrando coisas assim:
=== START OF INFORMATION SECTION ===
Device Model: ST3000DM001-1CH166
...
Local Time is: Thu Aug 6 20:45:02 2015 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
...
SMART Error Log Version: 1
ATA Error Count: 12 (device log contains only the most recent five errors)
...
Error 12 occurred at disk power-on lifetime: 21016 hours (875 days + 16 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 00 ff ff ff 4f 00 8d+20:05:45.525 READ FPDMA QUEUED
ef 10 02 00 00 00 a0 00 8d+20:05:45.525 SET FEATURES [Reserved for Serial ATA]
27 00 00 00 00 00 e0 00 8d+20:05:45.525 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 a0 00 8d+20:05:45.524 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 00 8d+20:05:45.524 SET FEATURES [Set transfer mode]
...
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 21129 -
# 2 Short offline Completed without error 00% 18418 -
# 3 Extended offline Completed without error 00% 1860 -
# 4 Short offline Completed without error 00% 1855 -
...
A saída completa pode ser encontrada aqui: link
Devo confiar no mdadm dizendo que /dev/sda
é ruim e que eu devo confiar em /dev/sdb
, ou devo confiar no SMART em /dev/sdb
com erros e /dev/sda
ainda estar em boa forma?