CentOS 7 DL120 G9 com H240 - Monitorando RAID

1

Acabei de configurar um novo servidor usando uma placa Smart HBA H240 e instalei o hpssaducli e ele detecta o controlador e me permite gerar um relatório.

O problema que estou tendo é como posso detectar a falha do RAID e enviar um alerta.

O relatório gerado via hpssaducli contém uma enorme quantidade de informações que é difícil de peneirar e atualmente não tem uma matriz com falha, então não tenho certeza de quais informações eu precisaria encontrar no caso de uma falha na unidade.

Detalhes

root@server [~]# lsmod | grep hp
hpwdt                  14242  0
hpilo                  17381  0
shpchp                 37032  0
hpsa                   94958  3

root@server [~]# rpm -qa | grep hpsa
kmod-hpsa-3.4.12-110.rhel7u1.x86_64

root@server [~]# uname -a
Linux server.hostname 3.10.0-229.14.1.el7.x86_64 #1 SMP Tue Sep 15 15:05:51 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

root@server [~]# hpssaducli
HP Smart Storage Diagnostics 2.10.14.0
Usage: hpssaducli [ -adu | -ssd | -val ] [ command-specific options ]
...
...

Diagnosable devices:
Smart HBA H240 in Slot 2

Saída do hpssacli

root@server [~]# hpssacli ctrl all show config detail

Smart HBA H240 in Slot 2 (RAID Mode)
   Bus Interface: PCI
   Slot: 2
   Serial Number: XXXXXXXXX
   Cache Serial Number: XXXXXXXXX
   Controller Status: OK
   Hardware Revision: B
   Firmware Version: 1.34
   Rebuild Priority: High
   Surface Scan Delay: 3 secs
   Surface Scan Mode: Idle
   Parallel Surface Scan Supported: No
   Queue Depth: Automatic
   Monitor and Performance Delay: 60  min
   Elevator Sort: Enabled
   Degraded Performance Optimization: Disabled
   Inconsistency Repair Policy: Disabled
   Wait for Cache Room: Disabled
   Surface Analysis Inconsistency Notification: Disabled
   Post Prompt Timeout: 15 secs
   Cache Board Present: False
   Drive Write Cache: Disabled
   Controller Memory Size: 256 MB
   SATA NCQ Supported: True
   Spare Activation Mode: Activate on physical drive failure (default)
   Controller Temperature (C): 72
   Cache Module Temperature (C): 36
   Number of Ports: 2 Internal only
   Encryption: Disabled
   Express Local Encryption: False
   Driver Name: hpsa
   Driver Version: 3.4.12
   Driver Supports HP SSD Smart Path: True
   PCI Address (Domain:Bus:Device.Function): 0000:0A:00.0
   Negotiated PCIe Data Rate: PCIe 3.0 x8 (7880 MB/s)
   Controller Mode: RAID Mode
   Controller Mode Reboot: Not Required
   Latency Scheduler Setting: Disabled
   Current Power Mode: MaxPerformance
   Host Serial Number: CZ250305FS
   Sanitize Erase Supported: False
   Primary Boot Volume: None
   Secondary Boot Volume: None


   Port Name: 2I
         Port ID: 0
         Port Connection Number: 0
         SAS Address: 500143803366B9C0
         Port Location: Internal
         Managed Cable Connected: False

   Port Name: 1I
         Port ID: 1
         Port Connection Number: 1
         SAS Address: 500143803366B9C4
         Port Location: Internal
         Managed Cable Connected: False

   Internal Drive Cage at Port 1I, Box 1, OK
      Power Supply Status: Not Redundant
      Drive Bays: 4
      Port: 1I
      Box: 1
      Location: Internal

   Physical Drives
      physicaldrive 1I:1:1 (port 1I:box 1:bay 1, Solid State SATA, 500 GB, OK)
      physicaldrive 1I:1:2 (port 1I:box 1:bay 2, Solid State SATA, 500 GB, OK)
      physicaldrive 1I:1:3 (port 1I:box 1:bay 3, Solid State SATA, 500 GB, OK)
      physicaldrive 1I:1:4 (port 1I:box 1:bay 4, Solid State SATA, 500 GB, OK)
      None attached


   Internal Drive Cage at Port 2I, Box 0, OK
      Power Supply Status: Not Redundant
      Drive Bays: 4
      Port: 2I
      Box: 0
      Location: Internal

   Physical Drives
      None attached
      None attached

   Array: A
      Interface Type: Solid State SATA
      Unused Space: 0  MB (0.0%)
      Used Space: 1.8 TB (100.0%)
      Status: OK
      Array Type: Data
      HP SSD Smart Path: enable



      Logical Drive: 1
         Size: 931.5 GB
         Fault Tolerance: 1+0
         Heads: 255
         Sectors Per Track: 32
         Cylinders: 65535
         Strip Size: 256 KB
         Full Stripe Size: 512 KB
         Status: Ready for Rebuild
         Caching:  Disabled
         Unique Identifier: XXXXXXXXX
         Disk Name: /dev/sda
         Mount Points: /boot/efi 200 MB Partition Number 2, /boot 500 MB Partition Number 3
         OS Status: LOCKED
         Logical Drive Label: 026ACA51PDNNK0ARH7Q0B9471B
         Mirror Group 1:
      Smart HBA H240 in Slot 2
      physicaldrive 1I:1:1 (port 1I:box 1:bay 1, Solid State SATA, 500 GB, OK)
      Smart HBA H240 in Slot 2
      physicaldrive 1I:1:2 (port 1I:box 1:bay 2, Solid State SATA, 500 GB, OK)
         Mirror Group 2:
      Smart HBA H240 in Slot 2
      physicaldrive 1I:1:3 (port 1I:box 1:bay 3, Solid State SATA, 500 GB, OK)
      Smart HBA H240 in Slot 2
      physicaldrive 1I:1:4 (port 1I:box 1:bay 4, Solid State SATA, 500 GB, OK)
         Drive Type: Data
         LD Acceleration Method: HP SSD Smart Path

      physicaldrive 1I:1:1
         Port: 1I
         Box: 1
         Bay: 1
         Status: OK
         Drive Type: Data Drive
         Interface Type: Solid State SATA
         Size: 500 GB
         Drive exposed to OS: False
         Native Block Size: 512
         Firmware Revision: EMT01B6Q
         Serial Number: XXXXXXXXX
         Model: ATA     Samsung SSD 850
         SATA NCQ Capable: True
         SATA NCQ Enabled: True
         Current Temperature (C): 27
         Maximum Temperature (C): 70
         SSD Smart Trip Wearout: Not Supported
         PHY Count: 1
         PHY Transfer Rate: 6.0Gbps
         Drive Authentication Status: Not Authenticated. Smart Array will not control drive LEDs.
         Sanitize Erase Supported: False

      physicaldrive 1I:1:2
         Port: 1I
         Box: 1
         Bay: 2
         Status: OK
         Drive Type: Data Drive
         Interface Type: Solid State SATA
         Size: 500 GB
         Drive exposed to OS: False
         Native Block Size: 512
         Firmware Revision: EMT01B6Q
         Serial Number: XXXXXXXXX
         Model: ATA     Samsung SSD 850
         SATA NCQ Capable: True
         SATA NCQ Enabled: True
         Current Temperature (C): 27
         Maximum Temperature (C): 70
         SSD Smart Trip Wearout: Not Supported
         PHY Count: 1
         PHY Transfer Rate: 6.0Gbps
         Drive Authentication Status: OK
         Carrier Application Version: 11
         Carrier Bootloader Version: 6
         Sanitize Erase Supported: False

      physicaldrive 1I:1:3
         Port: 1I
         Box: 1
         Bay: 3
         Status: OK
         Drive Type: Data Drive
         Interface Type: Solid State SATA
         Size: 500 GB
         Drive exposed to OS: False
         Native Block Size: 512
         Firmware Revision: EMT01B6Q
         Serial Number: XXXXXXXXX
         Model: ATA     Samsung SSD 850
         SATA NCQ Capable: True
         SATA NCQ Enabled: True
         Current Temperature (C): 28
         Maximum Temperature (C): 70
         SSD Smart Trip Wearout: Not Supported
         PHY Count: 1
         PHY Transfer Rate: 6.0Gbps
         Drive Authentication Status: OK
         Carrier Application Version: 11
         Carrier Bootloader Version: 6
         Sanitize Erase Supported: False

      physicaldrive 1I:1:4
         Port: 1I
         Box: 1
         Bay: 4
         Status: OK
         Drive Type: Data Drive
         Interface Type: Solid State SATA
         Size: 500 GB
         Drive exposed to OS: False
         Native Block Size: 512
         Firmware Revision: EMT01B6Q
         Serial Number: XXXXXXXXX
         Model: ATA     Samsung SSD 850
         SATA NCQ Capable: True
         SATA NCQ Enabled: True
         Current Temperature (C): 28
         Maximum Temperature (C): 70
         SSD Smart Trip Wearout: Not Supported
         PHY Count: 1
         PHY Transfer Rate: 6.0Gbps
         Drive Authentication Status: OK
         Carrier Application Version: 11
         Carrier Bootloader Version: 6
         Sanitize Erase Supported: False
    
por copyandpaster 27.10.2015 / 11:56

1 resposta

1

Eu não quero fechar isso como uma duplicata, mas você deve instalar os Agentes de Gerenciamento da HP para fornecer informações de integridade do servidor. Isso está disponível via yum ou usando os pacotes individuais listados no suporte de site para o ProLiant DL120 Gen9 e RHEL7.

Veja: Servidor HP ProLiant DL380e Gen8 - SPP use para algumas ideias ...

No mínimo, você pode usar o ferramenta hpssacli para fornecer informações reais sobre o controlador RAID sob demanda.

Mas entenda que o servidor também é capaz de enviar e-mails, traps SNMP e registrar eventos de integridade quando você inclui os outros utilitários.

    
por 27.10.2015 / 12:16