Recuperando um RAID5 com 2 discos fora do array

1

Eu não quero abordar isso de uma maneira de tentativa e erro, pois sei que essa é a melhor coisa a fazer se eu quiser perder meus dados.

Eu tenho um servidor com discos 4 * 2TB em RAID5 (sim, eu sei que é não é sábio ) em um Ubuntu 14.04.

A maioria dos meus dados está em /home no RAID 5 e / está no RAID1.

Eu inicializei o servidor no modo de recuperação, mas não consigo descobrir:

  • se o problema for soft ou hardware,
  • se houver uma maneira de remontar a invasão para recuperar esses dados.

Eu tive uma leitura cuidadosa de Recuperando um RAID de software com falha (raid.wiki.kernel.org) , mas como não estou realmente confiante sobre o meu diagnóstico, gostaria de ter um julgamento consciente sobre o que está acontecendo e como proceder se houver algo a fazer…

A única coisa que tentei foi montar meus dispositivos mds que não estavam montados, que funcionavam para md2 mount /dev/md2 /mnt/ , mas eu não consegui montar nem md0 nem md3 como eu disse /dev/md3: can't read superblock .

Até agora, foi o que verifiquei:

EDIT parted -l

root@rescue:/mnt# parted -l
Model: ATA ST2000DM001-1CH1 (scsi)
Disk /dev/sda: 2000GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags: 

Number  Start   End     Size    File system     Name     Flags
 1      20.5kB  1049kB  1029kB                  primary  bios_grub
 2      2097kB  10.5GB  10.5GB  ext4            primary  raid
 3      10.5GB  2000GB  1989GB                  primary  raid
 4      2000GB  2000GB  536MB   linux-swap(v1)  primary


Model: ATA ST2000DM001-1CH1 (scsi)
Disk /dev/sdb: 2000GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags: 

Number  Start   End     Size    File system     Name     Flags
 1      20.5kB  1049kB  1029kB                  primary  bios_grub
 2      2097kB  10.5GB  10.5GB  ext4            primary  raid
 3      10.5GB  2000GB  1989GB                  primary  raid
 4      2000GB  2000GB  536MB   linux-swap(v1)  primary


Error: /dev/sdc: unrecognised disk label
Model: ATA ST2000DM001-1CH1 (scsi)                                        
Disk /dev/sdc: 2000GB
Sector size (logical/physical): 512B/4096B
Partition Table: unknown
Disk Flags: 

Model: ATA ST2000DM001-1CH1 (scsi)
Disk /dev/sdd: 2000GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags: 

Number  Start   End     Size    File system     Name     Flags
 1      20.5kB  1049kB  1029kB                  primary  bios_grub
 2      2097kB  10.5GB  10.5GB  ext4            primary  raid
 3      10.5GB  2000GB  1989GB                  primary  raid
 4      2000GB  2000GB  536MB   linux-swap(v1)  primary


Model: Linux Software RAID Array (md)
Disk /dev/md2: 10.5GB
Sector size (logical/physical): 512B/4096B
Partition Table: loop
Disk Flags: 

Number  Start  End     Size    File system  Flags
 1      0.00B  10.5GB  10.5GB  ext4


Error: /dev/md127: unrecognised disk label
Model: Linux Software RAID Array (md)                                     
Disk /dev/md127: 10.5GB
Sector size (logical/physical): 512B/4096B
Partition Table: unknown
Disk Flags: 

smartctl

root@rescue:~# smartctl -a -d ata /dev/sdc
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.10.23-xxxx-std-ipv6-64-rescue] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

Smartctl: Device Read Identity Failed: Input/output error

A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.


root@rescue:~# smartctl -a -d ata /dev/sdd
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.10.23-xxxx-std-ipv6-64-rescue] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.14 (AF)
Device Model:     ST2000DM001-1CH164
Serial Number:    W1E1KX59
LU WWN Device Id: 5 000c50 05c821593
Firmware Version: CC43
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Tue Dec 30 16:04:49 2014 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
          was never started.
          Auto Offline Data Collection: Disabled.
Self-test execution status:      ( 121) The previous self-test completed having
          the read element of the test failed.
Total time to complete Offline 
data collection:        (  584) seconds.
Offline data collection
capabilities:            (0x73) SMART execute Offline immediate.
          Auto Offline data collection on/off support.
          Suspend Offline collection upon new
          command.
          No Offline surface scan supported.
          Self-test supported.
          Conveyance Self-test supported.
          Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
          power-saving mode.
          Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
          General Purpose Logging supported.
Short self-test routine 
recommended polling time:    (   1) minutes.
Extended self-test routine
recommended polling time:    ( 230) minutes.
Conveyance self-test routine
recommended polling time:    (   2) minutes.
SCT capabilities:          (0x3085) SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   111   099   006    Pre-fail  Always       -       120551532
  3 Spin_Up_Time            0x0003   095   095   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       32
  5 Reallocated_Sector_Ct   0x0033   097   097   036    Pre-fail  Always       -       4008
  7 Seek_Error_Rate         0x000f   077   060   030    Pre-fail  Always       -       4351310995
  9 Power_On_Hours          0x0032   079   079   000    Old_age   Always       -       18725
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       32
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   089   089   000    Old_age   Always       -       11
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0 0 0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   068   056   045    Old_age   Always       -       32 (Min/Max 26/35)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       31
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       46
194 Temperature_Celsius     0x0022   032   044   000    Old_age   Always       -       32 (0 16 0 0)
197 Current_Pending_Sector  0x0012   082   082   000    Old_age   Always       -       3056
198 Offline_Uncorrectable   0x0010   082   082   000    Old_age   Offline      -       3056
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       18708h+109m+27.415s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       24242600022
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       112149279703

SMART Error Log Version: 1
ATA Error Count: 11 (device log contains only the most recent five errors)
  CR = Command Register [HEX]
  FR = Features Register [HEX]
  SC = Sector Count Register [HEX]
  SN = Sector Number Register [HEX]
  CL = Cylinder Low Register [HEX]
  CH = Cylinder High Register [HEX]
  DH = Device/Head Register [HEX]
  DC = Device Command Register [HEX]
  ER = Error register [HEX]
  ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 11 occurred at disk power-on lifetime: 18520 hours (771 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 ff ff ff 4f 00      09:32:13.900  READ FPDMA QUEUED
  ef 10 02 00 00 00 a0 00      09:32:13.898  SET FEATURES [Reserved for Serial ATA]
  27 00 00 00 00 00 e0 00      09:32:13.898  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 00      09:32:13.898  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00      09:32:13.898  SET FEATURES [Set transfer mode]

Error 10 occurred at disk power-on lifetime: 18520 hours (771 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 ff ff ff 4f 00      09:32:10.764  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00      09:32:10.763  READ FPDMA QUEUED
  60 00 38 ff ff ff 4f 00      09:32:10.763  READ FPDMA QUEUED
  ef 10 02 00 00 00 a0 00      09:32:10.763  SET FEATURES [Reserved for Serial ATA]
  27 00 00 00 00 00 e0 00      09:32:10.763  READ NATIVE MAX ADDRESS EXT

Error 9 occurred at disk power-on lifetime: 18520 hours (771 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 38 ff ff ff 4f 00      09:32:09.084  READ FPDMA QUEUED
  61 00 08 00 88 38 41 00      09:32:07.445  WRITE FPDMA QUEUED
  60 00 08 ff ff ff 4f 00      09:32:07.416  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00      09:32:07.416  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00      09:32:07.416  READ FPDMA QUEUED

Error 8 occurred at disk power-on lifetime: 18520 hours (771 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: WP at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 00 40 ff ff ff 4f 00      09:32:04.118  WRITE FPDMA QUEUED
  61 00 08 70 88 38 41 00      09:32:04.117  WRITE FPDMA QUEUED
  61 00 40 ff ff ff 4f 00      09:32:04.117  WRITE FPDMA QUEUED
  60 00 40 ff ff ff 4f 00      09:32:04.117  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00      09:32:04.117  READ FPDMA QUEUED

Error 7 occurred at disk power-on lifetime: 17319 hours (721 days + 15 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 50 a9 59 02  Error: UNC at LBA = 0x0259a950 = 39430480

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 00 00 aa 59 42 00      01:31:02.054  READ FPDMA QUEUED
  60 00 00 00 a6 59 42 00      01:31:02.054  READ FPDMA QUEUED
  60 00 00 00 92 36 42 00      01:30:55.032  READ FPDMA QUEUED
  60 00 00 00 86 36 42 00      01:30:51.600  READ FPDMA QUEUED
  60 00 00 00 82 36 42 00      01:30:51.593  READ FPDMA QUEUED

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       90%     18706         3848494344
# 2  Short offline       Completed without error       00%      3481         -
# 3  Short offline       Completed without error       00%      3472         -
# 4  Short offline       Completed without error       00%      3472         -
# 5  Short offline       Completed without error       00%        13         -
# 6  Short offline       Completed without error       00%         5         -
# 7  Short offline       Completed without error       00%         5         -
# 8  Short offline       Completed without error       00%         0         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

ls / dev

ls /dev
MAKEDEV      md0              ptya0  ptyc5  ptyea  ptyqf  ptyt4  ptyv9  ptyxe  ram11   sg1     tty34  ttyS1  ttyc2  ttye7  ttyqc  ttyt1  ttyv6  ttyxb  urandom
aer_inject   md127            ptya1  ptyc6  ptyeb  ptyr0  ptyt5  ptyva  ptyxf  ram12   sg2     tty35  ttyS2  ttyc3  ttye8  ttyqd  ttyt2  ttyv7  ttyxc  vcs
autofs       md2              ptya2  ptyc7  ptyec  ptyr1  ptyt6  ptyvb  ptyy0  ram13   sg3     tty36  ttyS3  ttyc4  ttye9  ttyqe  ttyt3  ttyv8  ttyxd  vcs1
block        md3              ptya3  ptyc8  ptyed  ptyr2  ptyt7  ptyvc  ptyy1  ram14   shm     tty37  ttya0  ttyc5  ttyea  ttyqf  ttyt4  ttyv9  ttyxe  vcs2
[…]

cat / proc / mdstat

cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] [faulty] 
md127 : active raid1 sdc2[2]
      10238912 blocks [4/1] [__U_]

md2 : active raid1 sdd2[3] sda2[0] sdb2[1]
      10238912 blocks [4/3] [UU_U]

mdadm --detail

root@rescue:~# mdadm --detail /dev/md2
/dev/md2:
        Version : 0.90
  Creation Time : Tue Sep  2 16:46:34 2014
     Raid Level : raid1
     Array Size : 10238912 (9.76 GiB 10.48 GB)
  Used Dev Size : 10238912 (9.76 GiB 10.48 GB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 2
    Persistence : Superblock is persistent

    Update Time : Sat Dec 27 17:31:03 2014
          State : clean, degraded 
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

           UUID : 5a33c710:006f668d:a4d2adc2:26fd5302 (local to host rescue.ovh.net)
         Events : 0.503145

    Number   Major   Minor   RaidDevice State
       0       8        2        0      active sync   /dev/sda2
       1       8       18        1      active sync   /dev/sdb2
       2       0        0        2      removed
       3       8       50        3      active sync   /dev/sdd2

root@rescue:~# mdadm --detail /dev/md127
/dev/md127:
        Version : 0.90
  Creation Time : Tue Sep  2 16:46:34 2014
     Raid Level : raid1
     Array Size : 10238912 (9.76 GiB 10.48 GB)
  Used Dev Size : 10238912 (9.76 GiB 10.48 GB)
   Raid Devices : 4
  Total Devices : 1
Preferred Minor : 127
    Persistence : Superblock is persistent

    Update Time : Sat Dec 27 17:31:16 2014
          State : clean, degraded 
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

    Number   Major   Minor   RaidDevice State
       0       0        0        0      removed
       1       0        0        1      removed
       2       8       34        2      active sync   /dev/sdc2
       3       0        0        3      removed

E finalmente mdadm --examine sd *

root@rescue:~# mdadm --examine /dev/sd*
/dev/sda:
   MBR Magic : aa55
Partition[0] :   3907029167 sectors at            1 (type ee)
mdadm: No md superblock detected on /dev/sda1.
/dev/sda2:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 5a33c710:006f668d:a4d2adc2:26fd5302 (local to host rescue.ovh.net)
  Creation Time : Tue Sep  2 16:46:34 2014
     Raid Level : raid1
  Used Dev Size : 10238912 (9.76 GiB 10.48 GB)
     Array Size : 10238912 (9.76 GiB 10.48 GB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 2

    Update Time : Sat Dec 27 18:20:56 2014
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 1
  Spare Devices : 0
       Checksum : 78eed7b8 - correct
         Events : 503147


      Number   Major   Minor   RaidDevice State
this     0       8        2        0      active sync   /dev/sda2

   0     0       8        2        0      active sync   /dev/sda2
   1     1       8       18        1      active sync   /dev/sdb2
   2     2       0        0        2      faulty removed
   3     3       8       50        3      active sync   /dev/sdd2
/dev/sda3:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 4a417350:7192f812:a4d2adc2:26fd5302 (local to host rescue.ovh.net)
  Creation Time : Tue Sep  2 16:46:35 2014
     Raid Level : raid5
  Used Dev Size : 1942745600 (1852.75 GiB 1989.37 GB)
     Array Size : 5828236800 (5558.24 GiB 5968.11 GB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 3

    Update Time : Mon Dec 22 10:33:05 2014
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 2
  Spare Devices : 0
       Checksum : 4d44c428 - correct
         Events : 109608

         Layout : left-symmetric
     Chunk Size : 512K

      Number   Major   Minor   RaidDevice State
this     0       8        3        0      active sync   /dev/sda3

   0     0       8        3        0      active sync   /dev/sda3
   1     1       8       19        1      active sync   /dev/sdb3
   2     2       0        0        2      faulty removed
   3     3       0        0        3      faulty removed
mdadm: No md superblock detected on /dev/sda4.
/dev/sdb:
   MBR Magic : aa55
Partition[0] :   3907029167 sectors at            1 (type ee)
mdadm: No md superblock detected on /dev/sdb1.
/dev/sdb2:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 5a33c710:006f668d:a4d2adc2:26fd5302 (local to host rescue.ovh.net)
  Creation Time : Tue Sep  2 16:46:34 2014
     Raid Level : raid1
  Used Dev Size : 10238912 (9.76 GiB 10.48 GB)
     Array Size : 10238912 (9.76 GiB 10.48 GB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 2

    Update Time : Sat Dec 27 18:20:56 2014
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 1
  Spare Devices : 0
       Checksum : 78eed7ca - correct
         Events : 503147


      Number   Major   Minor   RaidDevice State
this     1       8       18        1      active sync   /dev/sdb2

   0     0       8        2        0      active sync   /dev/sda2
   1     1       8       18        1      active sync   /dev/sdb2
   2     2       0        0        2      faulty removed
   3     3       8       50        3      active sync   /dev/sdd2
/dev/sdb3:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 4a417350:7192f812:a4d2adc2:26fd5302 (local to host rescue.ovh.net)
  Creation Time : Tue Sep  2 16:46:35 2014
     Raid Level : raid5
  Used Dev Size : 1942745600 (1852.75 GiB 1989.37 GB)
     Array Size : 5828236800 (5558.24 GiB 5968.11 GB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 3

    Update Time : Mon Dec 22 10:33:05 2014
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 2
  Spare Devices : 0
       Checksum : 4d44c43a - correct
         Events : 109608

         Layout : left-symmetric
     Chunk Size : 512K

      Number   Major   Minor   RaidDevice State
this     1       8       19        1      active sync   /dev/sdb3

   0     0       8        3        0      active sync   /dev/sda3
   1     1       8       19        1      active sync   /dev/sdb3
   2     2       0        0        2      faulty removed
   3     3       0        0        3      faulty removed
mdadm: No md superblock detected on /dev/sdb4.
mdadm: No md superblock detected on /dev/sdc.
mdadm: No md superblock detected on /dev/sdc1.
mdadm: No md superblock detected on /dev/sdc2.
mdadm: No md superblock detected on /dev/sdc3.
mdadm: No md superblock detected on /dev/sdc4.
/dev/sdd:
   MBR Magic : aa55
Partition[0] :   3907029167 sectors at            1 (type ee)
mdadm: No md superblock detected on /dev/sdd1.
/dev/sdd2:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 5a33c710:006f668d:a4d2adc2:26fd5302 (local to host rescue.ovh.net)
  Creation Time : Tue Sep  2 16:46:34 2014
     Raid Level : raid1
  Used Dev Size : 10238912 (9.76 GiB 10.48 GB)
     Array Size : 10238912 (9.76 GiB 10.48 GB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 2

    Update Time : Sat Dec 27 18:20:56 2014
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 1
  Spare Devices : 0
       Checksum : 78eed7ee - correct
         Events : 503147


      Number   Major   Minor   RaidDevice State
this     3       8       50        3      active sync   /dev/sdd2

   0     0       8        2        0      active sync   /dev/sda2
   1     1       8       18        1      active sync   /dev/sdb2
   2     2       0        0        2      faulty removed
   3     3       8       50        3      active sync   /dev/sdd2
/dev/sdd3:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 4a417350:7192f812:a4d2adc2:26fd5302 (local to host rescue.ovh.net)
  Creation Time : Tue Sep  2 16:46:35 2014
     Raid Level : raid5
  Used Dev Size : 1942745600 (1852.75 GiB 1989.37 GB)
     Array Size : 5828236800 (5558.24 GiB 5968.11 GB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 3

    Update Time : Mon Dec 22 01:55:55 2014
          State : active
 Active Devices : 3
Working Devices : 3
 Failed Devices : 1
  Spare Devices : 0
       Checksum : 4d429eeb - correct
         Events : 109599

         Layout : left-symmetric
     Chunk Size : 512K

      Number   Major   Minor   RaidDevice State
this     3       8       51        3      active sync   /dev/sdd3

   0     0       8        3        0      active sync   /dev/sda3
   1     1       8       19        1      active sync   /dev/sdb3
   2     2       0        0        2      faulty removed
   3     3       8       51        3      active sync   /dev/sdd3
mdadm: No md superblock detected on /dev/sdd4.

EDITAR

Em uma última tentativa, eu desmontei / dev / md2 e parei com o mdadm.

Então eu consegui montar / dev / md3:

mdadm --assemble --force /dev/md3 /dev/sd[abd]3
mdadm: forcing event count in /dev/sdd3(3) from 109599 upto 109608
mdadm: clearing FAULTY flag for device 2 in /dev/md3 for /dev/sdd3
mdadm: Marking array /dev/md3 as 'clean'
mdadm: /dev/md3 has been started with 3 drives (out of 4).

Syslog no momento:

md/raid:md3: device sda3 operational as raid disk 0
md/raid:md3: device sdd3 operational as raid disk 3
md/raid:md3: device sdb3 operational as raid disk 1
md/raid:md3: allocated 4338kB
md/raid:md3: raid level 5 active with 3 out of 4 devices, algorithm 2
RAID conf printout:
 --- level:5 rd:4 wd:3
 disk 0, o:1, dev:sda3
 disk 1, o:1, dev:sdb3
 disk 3, o:1, dev:sdd3
md3: detected capacity change from 0 to 5968114483200
 md3: unknown partition table

E o RAID parecia OK:

root@rescue:/mnt# cat /proc/mdstat 
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] [faulty] 
md3 : active raid5 sda3[0] sdd3[3] sdb3[1]
      5828236800 blocks level 5, 512k chunk, algorithm 2 [4/3] [UU_U]
[…]

Mas não consegui montá-lo:

root@rescue:/mnt# mount /dev/md3 /mnt/home
mount: wrong fs type, bad option, bad superblock on /dev/md3,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail  or so

Enquanto isso, muitos erros apareceram no syslog:

ata4.00: exception Emask 0x0 SAct 0xfe SErr 0x0 action 0x0
ata4.00: irq_stat 0x40000008
ata4.00: failed command: READ FPDMA QUEUED
ata4.00: cmd 60/18:08:18:34:63/00:00:e5:00:00/40 tag 1 ncq 12288 in
         res 41/40:18:18:34:63/00:00:e5:00:00/00 Emask 0x409 (media error) <F>
ata4.00: status: { DRDY ERR }
ata4.00: error: { UNC }
ata4.00: configured for UDMA/133
sd 3:0:0:0: [sdd] Unhandled sense code
sd 3:0:0:0: [sdd]  
Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 3:0:0:0: [sdd]  
Sense Key : Medium Error [current] [descriptor]
Descriptor sense data with sense descriptors (in hex):
        72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 
        e5 63 34 18 
sd 3:0:0:0: [sdd]  
Add. Sense: Unrecovered read error - auto reallocate failed
sd 3:0:0:0: [sdd] CDB: 
Read(10): 28 00 e5 63 34 18 00 00 18 00
blk_update_request: 46 callbacks suppressed
end_request: I/O error, dev sdd, sector 3848483864
md/raid:md3: read error not correctable (sector 3828001816 on sdd3).
md/raid:md3: Disk failure on sdd3, disabling device.

Eu tentei corrigi-los com hdparm , mas há muitos deles e cada vez que aparece um monte de novos.

Obviamente, como mencionado no syslog md/raid:md3: Disk failure on sdd3, disabling device. ao mesmo tempo em que tentei montar o md3, o estado do array mudou para FAILED.

Parece que perdi esta batalha ...

    
por Buzut 27.12.2014 / 19:38

1 resposta

1

Se eu entendi direito o seu / dev / md3 é o raid5 e deve consistir em / dev / sda3, / dev / sdb3, / dev / sdc3 (que não existem mais) e / dev / sdd3.

Então, o que você obtém do mdadm --detail / dev / md3?

Por que o / dev / sdc ainda parece ter uma tabela de partições quebrada?

Talvez seja possível restaurar seus dados mesmo que as partições de / dev / sdc ainda estejam faltando, mas eu tentaria a seguinte operação de recuperação:

1) Use algum live CD para inicializar, não monte partições ou discos raid.

2) Faça cópias de imagem brutas de todos os discos, sim, você precisará de mais de 8 TB de espaço livre em algum lugar, talvez em alguma unidade de rede. Se seus discos estiverem fisicamente bem, você poderá usar dd para fazer as cópias. Se algum disco estiver fisicamente quebrado, talvez seja necessário usar algum programa ddrescue.

3) Faça cópias de imagens brutas de trabalho das cópias de imagens brutas originais, sim, você precisará de mais 8 TB de espaço livre para isso.

4) Use uma máquina virtual como o qemu ou o virtualbox. Comece inicializando a máquina virtual com um bom CD ao vivo adequado para o resgate de dados. Systemrescuecd pode ser uma boa escolha.

5) De dentro da máquina virtual, usando suas cópias de imagem de disco bruto de trabalho, tente corrigir suas cópias de imagem de disco bruto de trabalho. Um local para iniciar pode ser adicionar uma tabela de partição à cópia de imagem do disco bruto de trabalho de / dev / sdc. A tabela de partições de / dev / sdc provavelmente deve ter a mesma aparência da tabela de partições de / dev / sdd.

6) Quando você acha que você corrigiu o problema, inicialize sua máquina virtual a partir de suas cópias de imagem de disco em funcionamento.

7) Uma vez que sua máquina virtual tenha provado que seus arquivos de imagem de disco foram corrigidos, copie as imagens fixas de volta para seus discos físicos. Se algum disco físico estiver quebrado, talvez você queira substituí-lo primeiro.

Se, em algum momento, você descobrir que suas tentativas de consertar a invasão quebrada apenas pioraram as coisas, sobrescreva suas imagens de disco bruto com as imagens de disco originais e reinicie.

    
por 30.12.2014 / 18:12