Que disco é ruim na matriz raid6

2

Servidor: Ubuntu Lucid Review Controlador RAID: Adaptec 3805
8 discos em RAID6 no hardware HP Proliant DL180 G5

Meu kern.log me diz que eu tenho um erro no sdb, como mostrado abaixo:

[2740390.344436] sd 4:0:1:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[2740390.344439] sd 4:0:1:0: [sdb] Sense Key : Hardware Error [current]
[2740390.344442] sd 4:0:1:0: [sdb] Add. Sense: Internal target failure
[2740390.344447] sd 4:0:1:0: [sdb] CDB: Read(10): 28 00 33 dd dc 00 00 00 08 00
[2740390.344454] end_request: I/O error, dev sdb, sector 870177792
[2774094.573841] sd 4:0:1:0: [sdb] Unhandled sense code
[2774094.573847] sd 4:0:1:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[2774094.573851] sd 4:0:1:0: [sdb] Sense Key : Hardware Error [current]
[2774094.573856] sd 4:0:1:0: [sdb] Add. Sense: Internal target failure
[2774094.573862] sd 4:0:1:0: [sdb] CDB: Read(16): 88 00 00 00 00 01 33 dd ef e8 00 00 01 00 00 00
[2774094.573873] end_request: I/O error, dev sdb, sector 5165150184
[2774094.615437] sd 4:0:1:0: [sdb] Unhandled sense code

O comando arcconf está me informando que todos os estados do disco estão on-line & Listras com falha: sim

Como posso identificar qual disco é ruim fora da matriz RAID6 de 8 discos?

Emendado: 2 de maio de 2012 - adicionado o seguinte:

/ usr / local / sbin / arcconf getconfig 1 AL

Controllers found: 1
----------------------------------------------------------------------
Controller information
----------------------------------------------------------------------
Controller Status                        : Optimal
Channel description                      : SAS/SATA
Controller Model                         : Adaptec 3805
Controller Serial Number                 : 0C18115C3BB
Temperature                              : 0 C/ 32 F (Normal)
Installed memory                         : 128 MB
Copyback                                 : Disabled
Background consistency check             : Disabled
Automatic Failover                       : Enabled
Global task priority                     : High
Stayawake period                         : Disabled
Spinup limit internal drives             : 0
Spinup limit external drives             : 0
Defunct disk drive count                 : 0
Logical devices/Failed/Degraded          : 2/0/0
NCQ status                               : Enabled
--------------------------------------------------------
Controller Version Information
--------------------------------------------------------
BIOS                                     : 5.2-0 (17342)
Firmware                                 : 5.2-0 (17342)
Driver                                   : 1.1-5 (2461)
Boot Flash                               : 5.2-0 (17342)
--------------------------------------------------------
Controller Battery Information
--------------------------------------------------------
Status                                   : Optimal
Over temperature                         : No
Capacity remaining                       : 99 percent
Time remaining (at current draw)         : 3 days, 1 hours, 11 minutes

----------------------------------------------------------------------
Logical device information
----------------------------------------------------------------------
Logical device number 0
Logical device name                      : boot
RAID level                               : 1
Status of logical device                 : Optimal
Size                                     : 476150 MB
Read-cache mode                          : Enabled
Write-cache mode                         : Enabled (write-back)
Write-cache setting                      : Enabled (write-back)
Partitioned                              : Yes
Protected by Hot-Spare                   : No
Bootable                                 : Yes
Failed stripes                           : No
Power settings                           : Disabled
--------------------------------------------------------
Logical device segment information
--------------------------------------------------------
Segment 0                                : Present (0,7)             Z2AD1A3H
Segment 1                                : Present (0,3)             Z2AD1834

Logical device number 1
Logical device name                      : data
RAID level                               : 6 Reed-Solomon
Status of logical device                 : Optimal
Size                                     : 2858990 MB
Stripe-unit size                         : 128 KB
Read-cache mode                          : Enabled
Write-cache mode                         : Enabled (write-back)
Write-cache setting                      : Enabled (write-back)
Partitioned                              : Yes
Protected by Hot-Spare                   : No
Bootable                                 : No
Failed stripes                           : Yes
Power settings                           : Disabled
--------------------------------------------------------
Logical device segment information
--------------------------------------------------------
Segment 0                                : Present (0,0)             6VPEFSZ0
Segment 1                                : Present (0,1)             5VPA5934
Segment 2                                : Present (0,2)             5VPA7132
Segment 3                                : Present (0,4)             5VPAJ8EJ
Segment 4                                : Present (0,5)             5VPA6NAZ
Segment 5                                : Present (0,6)             5VPAJM8Q


----------------------------------------------------------------------
Physical Device information
----------------------------------------------------------------------
  Device #0
     Device is a Hard drive
     State                              : Online
     Supported                          : Yes
     Transfer Speed                     : SATA 3.0 Gb/s
     Reported Channel,Device(T:L)       : 0,0(0:0)
     Reported Location                  : Connector 0, Device 0
     Vendor                             : ST375052
     Model                              : 5AS
     Firmware                           : JC4B
     Serial number                      : 6VPEFSZ0
     Size                               : 715404 MB
     Write Cache                        : Enabled (write-back)
     FRU                                : None
     S.M.A.R.T.                         : No
     S.M.A.R.T. warnings                : 0
     NCQ status                         : Enabled
  Device #1
     Device is a Hard drive
     State                              : Online
     Supported                          : Yes
     Transfer Speed                     : SATA 3.0 Gb/s
     Reported Channel,Device(T:L)       : 0,1(1:0)
     Reported Location                  : Connector 0, Device 1
     Vendor                             : ST375052
     Model                              : 5AS
     Firmware                           : JC4B
     Serial number                      : 5VPA5934
     Size                               : 715404 MB
     Write Cache                        : Enabled (write-back)
     FRU                                : None
     S.M.A.R.T.                         : No
     S.M.A.R.T. warnings                : 0
     NCQ status                         : Enabled
  Device #2
     Device is a Hard drive
     State                              : Online
     Supported                          : Yes
     Transfer Speed                     : SATA 3.0 Gb/s
     Reported Channel,Device(T:L)       : 0,2(2:0)
     Reported Location                  : Connector 0, Device 2
     Vendor                             : ST375052
     Model                              : 5AS
     Firmware                           : JC4B
     Serial number                      : 5VPA7132
     Size                               : 715404 MB
     Write Cache                        : Enabled (write-back)
     FRU                                : None
     S.M.A.R.T.                         : No
     S.M.A.R.T. warnings                : 0
     NCQ status                         : Enabled
  Device #3
     Device is a Hard drive
     State                              : Online
     Supported                          : Yes
     Transfer Speed                     : SATA 3.0 Gb/s
     Reported Channel,Device(T:L)       : 0,3(3:0)
     Reported Location                  : Connector 0, Device 3
     Vendor                             : ST500DM0
     Model                              : 02-1BD142
     Firmware                           : KC44
     Serial number                      : Z2AD1834
     Size                               : 476940 MB
     Write Cache                        : Enabled (write-back)
     FRU                                : None
     S.M.A.R.T.                         : No
     S.M.A.R.T. warnings                : 0
     NCQ status                         : Enabled
  Device #4
     Device is a Hard drive
     State                              : Online
     Supported                          : Yes
     Transfer Speed                     : SATA 3.0 Gb/s
     Reported Channel,Device(T:L)       : 0,4(4:0)
     Reported Location                  : Connector 1, Device 0
     Vendor                             : ST375052
     Model                              : 5AS
     Firmware                           : JC4B
     Serial number                      : 5VPAJ8EJ
     Size                               : 715404 MB
     Write Cache                        : Enabled (write-back)
     FRU                                : None
     S.M.A.R.T.                         : No
     S.M.A.R.T. warnings                : 0
     NCQ status                         : Enabled
  Device #5
     Device is a Hard drive
     State                              : Online
     Supported                          : Yes
     Transfer Speed                     : SATA 3.0 Gb/s
     Reported Channel,Device(T:L)       : 0,5(5:0)
     Reported Location                  : Connector 1, Device 1
     Vendor                             : ST375052
     Model                              : 5AS
     Firmware                           : JC4B
     Serial number                      : 5VPA6NAZ
     Size                               : 715404 MB
     Write Cache                        : Enabled (write-back)
     FRU                                : None
     S.M.A.R.T.                         : No
     S.M.A.R.T. warnings                : 0
     NCQ status                         : Enabled
  Device #6
     Device is a Hard drive
     State                              : Online
     Supported                          : Yes
     Transfer Speed                     : SATA 3.0 Gb/s
     Reported Channel,Device(T:L)       : 0,6(6:0)
     Reported Location                  : Connector 1, Device 2
     Vendor                             : ST375052
     Model                              : 5AS
     Firmware                           : JC4B
     Serial number                      : 5VPAJM8Q
     Size                               : 715404 MB
     Write Cache                        : Enabled (write-back)
     FRU                                : None
     S.M.A.R.T.                         : No
     S.M.A.R.T. warnings                : 0
     NCQ status                         : Enabled
  Device #7
     Device is a Hard drive
     State                              : Online
     Supported                          : Yes
     Transfer Speed                     : SATA 3.0 Gb/s
     Reported Channel,Device(T:L)       : 0,7(7:0)
     Reported Location                  : Connector 1, Device 3
     Vendor                             : ST500DM0
     Model                              : 02-1BD142
     Firmware                           : KC44
     Serial number                      : Z2AD1A3H
     Size                               : 476940 MB
     Write Cache                        : Enabled (write-back)
     FRU                                : None
     S.M.A.R.T.                         : No
     S.M.A.R.T. warnings                : 0
     NCQ status                         : Enabled


Command completed successfully.

Atualize com as informações da partição adicionadas abaixo :

**fdisk -l**

Disk /dev/sda: 499.3 GB, 499289948160 bytes
255 heads, 63 sectors/track, 60701 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0002ab26

Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1       59952   481562624   83  Linux
/dev/sda2           59953       60702     6022145    5  Extended
/dev/sda5           59953       60702     6022144   82  Linux swap / Solaris

WARNING: GPT (GUID Partition Table) detected on '/dev/sdb'! The util fdisk doesn't support GPT. Use GNU Parted.


Disk /dev/sdb: 2997.9 GB, 2997878784000 bytes
255 heads, 63 sectors/track, 364471 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

Device Boot      Start         End      Blocks   Id  System
/dev/sdb1               1      267350  2147483647+  ee  GPT



**df -h**
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1             453G  112G  319G  26% /
none                 1000M  224K 1000M   1% /dev
none                 1005M     0 1005M   0% /dev/shm
none                 1005M  664K 1004M   1% /var/run
none                 1005M  4.0K 1005M   1% /var/lock
none                 1005M     0 1005M   0% /lib/init/rw
/dev/sdb1             2.7T  1.5T  1.1T  58% /media/raid1
/dev/sdb1             2.7T  1.5T  1.1T  58% /media/usbhd-sdb1
/dev/sda1             453G  112G  319G  26% /media/usbhd-sda1


**fstab**
# /etc/fstab: static file system information.
#
# Use 'blkid -o value -s UUID' to print the universally unique identifier
# for a device; this may be used with UUID= as a more robust way to name
# devices that works even if disks are added and removed. See fstab(5).
#
# <file system> <mount point>   <type>  <options>       <dump>  <pass>
proc            /proc           proc    nodev,noexec,nosuid 0       0
# / was on /dev/sda1 during installation
UUID=12dd3c31-6dba-4c26-ba81-88a76510bffd /               ext4    errors=remount-ro 0               1
# swap was on /dev/sda5 during installation
UUID=81618042-ec4e-45e9-947f-9198d29651d3 none            swap    sw              0       0
UUID=a7832728-5bf9-45c4-8a29-2824b4f2c250 /media/raid1    ext4    errors=remount-ro,noatime 0       1
    
por sixnumber 01.05.2012 / 16:48

5 respostas

3

A menos que eu esteja enganado, esses erros estão dizendo que você tem erros que não foram corrigidos pelo controlador RAID. O controlador RAID deve estar escondendo erros como esse de você. Eu não acho que você tenha uma falha simples no disco. Eu acho que você tem algo mais sério acontecendo.

    
por 01.05.2012 / 18:03
3

Assumindo que o volume "boot" em sua configuração de raid é reconhecido como sda e "data" como sdb, seu sistema lhe diz o seguinte:

[2740390.344436] sd 4:0:1:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE

O subsistema scsi emitiu um comando sem erro para o driver de nível baixo (para sua placa adaptec) e a placa respondeu com um erro (DRIVE_SENSE está configurado).

[2740390.344439] sd 4:0:1:0: [sdb] Sense Key : Hardware Error [current]

Este é o tipo de erro (veja informações do driver scsi ) .

[2740390.344442] sd 4:0:1:0: [sdb] Add. Sense: Internal target failure

Esta é uma informação adicional que o driver reporta enquanto esta informação, até onde eu sei, significa "nenhuma informação específica" / "não tem idéia do que deu errado".

[2740390.344454] end_request: I/O error, dev sdb, sector 870177792

O erro atingiu a camada de bloco.

Como afirmado em outra resposta: isso não é uma falha de disco único, isso é uma falha de todo o ataque. Você deve verificar seus dados cuidadosamente e considerar a substituição do subsistema de raids ou pelo menos do controlador.

E você deve sempre (!) habilitar "Verificação de consistência de plano de fundo" / "Verificação passiva" / "Verificar" em seus controladores de raid para localizar corrupção silenciosa que pode matar seu ataque no caso de uma reconstrução.

Você viu algum erro no sistema de arquivos? É / dev / sdb particionado / montado?

    
por 02.05.2012 / 11:34
1

Isso parecerá engraçado, mas você olhou na frente do servidor para ver qual unidade tinha um LED de erro aceso? (supondo que as unidades tenham LEDs)

Além disso, você pode instalar o software gerenciador de armazenamento: link

    
por 01.05.2012 / 17:04
0

É possível obter as informações por meio do smartctl (CLI) ou do CLI da Adaptec (como mencionado acima)

    
por 01.05.2012 / 17:06
0

Se você puder reinicializar o servidor, faça-o no SmartStart DVD. Se eu me lembro, você pode acessar a ACU a partir daí para ter uma visualização gráfica dos volumes RAID.

    
por 02.05.2012 / 13:01

Tags