mpt2sas falha preenche o syslog

3

Eu tenho uma mensagem de erro estranha nos registros, que começou assim:

:39:35 host1 kernel: [54674279.243416] mpt2sas0: fault_state(0x2651)!
:39:35 host1 kernel: [54674279.243543] mpt2sas0: sending diag reset !!
:39:36 host1 kernel: [54674280.481215] mpt2sas0: diag reset: SUCCESS
:39:36 host1 kernel: [54674280.713443] mpt2sas0: LSISAS2008: FWVersion(07.15.08.00), ChipRevision(0x03), BiosVersion(07.02.03.00)
:39:36 host1 kernel: [54674280.713451] mpt2sas0: Dell 6Gbps SAS HBA: Vendor(0x1000), Device(0x0072), SSVID(0x1028), SSDID(0x1F1C)
:39:36 host1 kernel: [54674280.713455] mpt2sas0: Protocol=(Initiator,Target), Capabilities=(Raid,TLR,EEDP,Snapshot Buffer,Diag Trace Buffer,Task Set Full,NCQ)
:39:36 host1 kernel: [54674280.713518] mpt2sas0: sending port enable !!
:39:43 host1 kernel: [54674287.616666] mpt2sas0: port enable: SUCCESS
:39:43 host1 kernel: [54674287.616814] mpt2sas0: search for end-devices: start
:39:43 host1 kernel: [54674287.617657] scsi target7:0:3: handle(0x0009), sas_addr(0x590b11c410294314), enclosure logical id(0x590b11c007729400), slot(7)
:39:43 host1 kernel: [54674287.617735] scsi target7:0:2: handle(0x000a), sas_addr(0x590b11c41025f914), enclosure logical id(0x590b11c007729400), slot(3)
:39:43 host1 kernel: [54674287.617807] mpt2sas0: search for end-devices: complete
:39:43 host1 kernel: [54674287.617810] mpt2sas0: search for raid volumes: start
:39:43 host1 kernel: [54674287.617813] mpt2sas0: search for responding raid volumes: complete
:39:43 host1 kernel: [54674287.617816] mpt2sas0: search for expanders: start
:39:43 host1 kernel: [54674287.617818] mpt2sas0: search for expanders: complete
:39:43 host1 kernel: [54674287.617833] mpt2sas0: search for end-devices: start
:39:43 host1 kernel: [54674287.618468] scsi target7:0:3: handle(0x0009), sas_addr(0x590b11c410294314), enclosure logical id(0x590b11c007729400), slot(7)
:39:43 host1 kernel: [54674287.618543] scsi target7:0:2: handle(0x000a), sas_addr(0x590b11c41025f914), enclosure logical id(0x590b11c007729400), slot(3)
:39:43 host1 kernel: [54674287.618614] mpt2sas0: search for end-devices: complete
:39:43 host1 kernel: [54674287.618617] mpt2sas0: search for raid volumes: start
:39:43 host1 kernel: [54674287.618619] mpt2sas0: search for responding raid volumes: complete
:39:43 host1 kernel: [54674287.618622] mpt2sas0: search for expanders: start
:39:43 host1 kernel: [54674287.618624] mpt2sas0: search for expanders: complete
:39:43 host1 kernel: [54674287.618632] mpt2sas0: _base_fault_reset_work: hard reset: success
:39:43 host1 kernel: [54674287.618639] mpt2sas0: removing unresponding devices: start
:39:43 host1 kernel: [54674287.618642] mpt2sas0: removing unresponding devices: complete
:39:43 host1 kernel: [54674287.618654] mpt2sas0: scan devices: start
:39:43 host1 kernel: [54674287.619530] mpt2sas0: failure at /build/buildd/linux-3.2.0/drivers/scsi/mpt2sas/mpt2sas_scsih.c:5157/_scsih_add_device()!
:39:43 host1 kernel: [54674287.619866] mpt2sas0: failure at /build/buildd/linux-3.2.0/drivers/scsi/mpt2sas/mpt2sas_scsih.c:5157/_scsih_add_device()!

e a última mensagem é repetida muitas vezes por segundo. Outras informações consideradas relevantes:

Esta é uma máquina Dell com kernel antigo do Linux conectado ao SAS para o disk array da Dell.

# uname -a
Linux host1 3.2.0-34-generic #53-Ubuntu SMP Thu Nov 15 10:48:16 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

# modinfo -F version mpt2sas 
10.100.00.00

lspci | grep LSI 
01:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 2008 [Falcon] (rev 03) 
08:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 03)

Quando mais debug adicionado ao mpt2sas, este é o resultado:

 mpt2sas0: failure at /build/buildd/linux-3.2.0/drivers/scsi/mpt2sas/mpt2sas_scsih.c:5157/_scsih_add_device()! 
  phy-7:4: refresh: parent sas_addr(0x590b11c007729400), 
       link_rate(0x08), phy(4) 
       attached_handle(0x0000), sas_addr(0x0000000000000000)

Outras máquinas, conectadas a diferentes volumes da matriz de disco, funcionam normalmente. O disk array e o iDrac não fornecem pistas nos logs, tudo parece normal. Googling forneceu algumas histórias de horror de que o RAID pode acabar com todos os discos. O problema não está ligado a uma carga extraordinariamente alta.

O comportamento continua por horas.

A Red Hat parece ter uma pergunta muito parecida, mas ainda não há solução (?):

link

Infelizmente, não consigo reiniciar a máquina para realizar experiências.

    
por Roman Susi 10.02.2016 / 12:09

0 respostas

Tags