Ubuntu 12.04 Software do servidor RAID1 - Reposição incorreta - Saída inteligente aprovada - Confusa

1

Acho que estou tendo um problema com um dos discos rígidos da minha matriz e estou tentando solucionar o problema.

Diz que o Spare com defeito e depois da reinicialização apenas diz removido:

/dev/md0:
        Version : 1.2
  Creation Time : Wed Aug 15 15:28:06 2012
     Raid Level : raid1
     Array Size : 2920368960 (2785.08 GiB 2990.46 GB)
  Used Dev Size : 2920368960 (2785.08 GiB 2990.46 GB)
   Raid Devices : 2
  Total Devices : 1
    Persistence : Superblock is persistent

    Update Time : Wed Aug 22 16:04:16 2012
          State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

           Name : M2FileServer:0  (local to host M2FileServer)
           UUID : 778e5d32:1cd810e8:dfebb663:868d66e6
           Events : 45564

    Number   Major   Minor   RaidDevice State
      0       8        2        0      active sync   /dev/sda2
      1       0        0        1      removed

Aqui estão as coisas do parted l para que você possa ver minhas partições

Model: ATA ST3000DM001-9YN1 (scsi)
Disk /dev/sda: 3001GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt

Number  Start   End     Size    File system  Name  Flags
 1      17.4kB  10.0GB  10.0GB  ext4               boot
 2      10.0GB  3001GB  2991GB  ext4               raid


Model: ATA ST3000DM001-9YN1 (scsi)
Disk /dev/sdb: 3001GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt

Number  Start   End     Size    File system     Name  Flags
1      17.4kB  10.0GB  10.0GB  linux-swap(v1)
2      10.0GB  3001GB  2991GB  ext4


Model: Linux Software RAID Array (md)
Disk /dev/md0: 2990GB
Sector size (logical/physical): 512B/4096B
Partition Table: loop

Number  Start  End     Size    File system  Flags
 1      0.00B  2990GB  2990GB  ext4

Eu corri o smartctl -a / dev / sdb2 e ele disse que eu passei, mas comparado ao meu drive de trabalho, os números RAW_VALUE eram realmente altos.

Aqui está o resultado:

=== START OF INFORMATION SECTION ===
Device Model:     ST3000DM001-9YN166
Serial Number:    W1F0NZZN
LU WWN Device Id: 5 000c50 052948b97
Firmware Version: CC4B
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Wed Aug 22 15:52:02 2012 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  584) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 255) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x3085) SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   078   077   006    Pre-fail  Always       -       3096976
  3 Spin_Up_Time            0x0003   093   093   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       25
  5 Reallocated_Sector_Ct   0x0033   082   082   036    Pre-fail  Always       -       23920
  7 Seek_Error_Rate         0x000f   100   253   030    Pre-fail  Always       -       994254
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       203
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       25
183 Runtime_Bad_Block       0x0032   097   097   000    Old_age   Always       -       3
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   001   001   000    Old_age   Always       -       188
188 Command_Timeout         0x0032   099   099   000    Old_age   Always       -       12885164036
189 High_Fly_Writes         0x003a   006   006   000    Old_age   Always       -       94
190 Airflow_Temperature_Cel 0x0022   058   053   045    Old_age   Always       -       42 (Min/Max 41/47)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       18
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       60
194 Temperature_Celsius     0x0022   042   047   000    Old_age   Always       -       42 (0 21 0 0)
197 Current_Pending_Sector  0x0012   074   025   000    Old_age   Always       -       4344
198 Offline_Uncorrectable   0x0010   074   025   000    Old_age   Offline      -       4344
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       159678294130889
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       462985523872
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       1565986232

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Obrigado pela sua ajuda antecipadamente! Jess

    
por Jessica Smith 22.08.2012 / 22:11

2 respostas

0

na verdade, parece que você ainda não passou por nenhum autoteste:

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

tente executar o smartctl -t long / dev / sdb.

    
por 26.07.2014 / 20:39
0
  5 Reallocated_Sector_Ct   0x0033   082   082   036    Pre-fail  Always       -       23920
183 Runtime_Bad_Block       0x0032   097   097   000    Old_age   Always       -       3
187 Reported_Uncorrect      0x0032   001   001   000    Old_age   Always       -       188
188 Command_Timeout         0x0032   099   099   000    Old_age   Always       -       12885164036
197 Current_Pending_Sector  0x0012   074   025   000    Old_age   Always       -       4344
198 Offline_Uncorrectable   0x0010   074   025   000    Old_age   Offline      -       4344

A unidade está falhando, mas não falha o suficiente para acionar uma "falha SMART", pois todos os números relacionados a erros estão acima dos limites de falha. A causa mais provável do evento de falha de RAID é um erro de leitura incorrigível (atributo SMART 187).

    
por 26.07.2014 / 23:08