Eu tenho 5 jbod conectados via LSI-SAS3008 ao controlador.
Estou usando o Arch-Linux 4.14.41-1-lts & multipath-tools v0.7.6 (03 / 10,2018)
Meu problema é quando um disco começa a dar erro de E / S e começa a piscar Multipath tentando verificar o disco e remapear o caminho com falha.
Jul 23 04:59:51 FKM1 multipathd[5315]: 35000c50093d4e7c7: sdbe - tur checker timed out
Jul 23 04:59:51 FKM1 multipathd[5315]: checker failed path 67:128 in map 35000c50093d4e7c7
Jul 23 04:59:51 FKM1 multipathd[5315]: 35000c50093d4e7c7: remaining active paths: 0
Jul 23 04:59:51 FKM1 multipathd[5315]: sdbe: mark as failed
Jul 23 04:59:56 FKM1 multipathd[5315]: checker failed path 67:128 in map 35000c50093d4e7c7
Jul 23 05:04:37 FKM1 multipathd[5315]: 67:128: reinstated
Jul 23 05:04:37 FKM1 multipathd[5315]: 35000c50093d4e7c7: remaining active paths: 1
Jul 23 05:05:27 FKM1 multipathd[5315]: 35000c50093d4e7c7: sdbe - tur checker timed out
Jul 23 05:05:27 FKM1 multipathd[5315]: checker failed path 67:128 in map 35000c50093d4e7c7
Jul 23 05:05:27 FKM1 multipathd[5315]: 35000c50093d4e7c7: remaining active paths: 0
Jul 23 05:05:27 FKM1 multipathd[5315]: sdbe: mark as failed
Por causa do multipath de disco defeituoso que tenta remapear toda vez que o disco é exibido.
[Fri Aug 3 00:18:37 2018] alua: device handler registered
[Fri Aug 3 00:18:37 2018] emc: device handler registered
[Fri Aug 3 00:18:37 2018] rdac: device handler registered
[Fri Aug 3 00:18:37 2018] device-mapper: uevent: version 1.0.3
[Fri Aug 3 00:18:37 2018] device-mapper: ioctl: 4.37.0-ioctl (2017-09-20) initialised: [email protected]
[Fri Aug 3 00:18:43 2018] device-mapper: multipath service-time: version 0.3.0 loaded
[Fri Aug 3 00:18:43 2018] device-mapper: table: 254:0: multipath: error getting device
[Fri Aug 3 00:18:43 2018] device-mapper: ioctl: error adding target to table
[Fri Aug 3 00:18:43 2018] device-mapper: table: 254:0: multipath: error getting device
[Fri Aug 3 00:18:43 2018] device-mapper: ioctl: error adding target to table
[Fri Aug 3 00:21:19 2018] sd 12:0:16:0: attempting task abort! scmd(ffffa03a6c4de948)
[Fri Aug 3 00:21:19 2018] sd 12:0:16:0: [sdbh] tag#1 CDB: opcode=0x88 88 00 00 00 00 02 ba a0 f0 00 00 00 02 00 00 00
[Fri Aug 3 00:21:19 2018] scsi target12:0:16: handle(0x001c), sas_address(0x5000c50093d5135d), phy(8)
[Fri Aug 3 00:21:19 2018] scsi target12:0:16: enclosure_logical_id(0x500304800929f87f), slot(8)
[Fri Aug 3 00:21:19 2018] scsi target12:0:16: enclosure level(0x0001),connector name(1 )
[Fri Aug 3 00:21:19 2018] sd 12:0:16:0: task abort: SUCCESS scmd(ffffa03a6c4de948)
[Fri Aug 3 00:21:19 2018] sd 12:0:16:0: attempting task abort! scmd(ffffa07b2eb87d48)
[Fri Aug 3 00:21:19 2018] sd 12:0:16:0: [sdbh] tag#0 CDB: opcode=0x88 88 00 00 00 00 02 ba a0 f0 00 00 00 02 00 00 00
[Fri Aug 3 00:21:19 2018] scsi target12:0:16: handle(0x001c), sas_address(0x5000c50093d5135d), phy(8)
[Fri Aug 3 00:21:19 2018] scsi target12:0:16: enclosure_logical_id(0x500304800929f87f), slot(8)
[Fri Aug 3 00:21:19 2018] scsi target12:0:16: enclosure level(0x0001),connector name(1 )
[Fri Aug 3 00:21:19 2018] sd 12:0:16:0: task abort: SUCCESS scmd(ffffa07b2eb87d48)
[Fri Aug 3 00:21:21 2018] device-mapper: multipath: Failing path 67:176.
[Fri Aug 3 00:21:21 2018] sd 12:0:16:0: attempting task abort! scmd(ffffa03a89b38148)
[Fri Aug 3 00:21:21 2018] sd 12:0:16:0: [sdbh] tag#11 CDB: opcode=0x0 00 00 00 00 00 00
[Fri Aug 3 00:21:21 2018] scsi target12:0:16: handle(0x001c), sas_address(0x5000c50093d5135d), phy(8)
[Fri Aug 3 00:21:21 2018] scsi target12:0:16: enclosure_logical_id(0x500304800929f87f), slot(8)
[Fri Aug 3 00:21:21 2018] scsi target12:0:16: enclosure level(0x0001),connector name(1 )
[Fri Aug 3 00:21:21 2018] sd 12:0:16:0: task abort: SUCCESS scmd(ffffa03a89b38148)
[Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 11721044480
[Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 0
[Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 512
[Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 11721043968
[Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 11721044480
[Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 0
[Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 512
[Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 11721043968
[Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 11721044480
[Fri Aug 3 00:21:57 2018] sd 12:0:16:0: attempting task abort! scmd(ffffa03a89b3f148)
Depois de um tempo, o ciclo continua quando o MPT3SAS Driver abre e reinicia a placa LSI.
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: iomem(0x00000000fbe40000), mapped(0xffffbe0e8dca0000), size(65536)
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: ioport(0x000000000000e000), size(256)
[Fri Aug 3 00:18:12 2018] usb 2-1-port6: over-current condition
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: sending message unit reset !!
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: message unit reset: SUCCESS
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: Allocated physical memory: size(20778 kB)
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: Current Controller Queue Depth(9564),Max Controller Queue Depth(9664)
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: Scatter Gather Elements per IO(128)
[Fri Aug 3 00:18:12 2018] usb 3-14.1: new low-speed USB device number 3 using xhci_hcd
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: LSISAS3008: FWVersion(15.00.02.00), ChipRevision(0x02), BiosVersion(08.35.00.00)
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: Protocol=(
[Fri Aug 3 00:18:12 2018] Initiator
[Fri Aug 3 00:18:12 2018] ,Target
[Fri Aug 3 00:18:12 2018] ),
[Fri Aug 3 00:18:12 2018] Capabilities=(
[Fri Aug 3 00:18:12 2018] TLR
[Fri Aug 3 00:18:12 2018] ,EEDP
[Fri Aug 3 00:18:12 2018] ,Snapshot Buffer
[Fri Aug 3 00:18:12 2018] ,Diag Trace Buffer
[Fri Aug 3 00:18:12 2018] ,Task Set Full
[Fri Aug 3 00:18:12 2018] ,NCQ
[Fri Aug 3 00:18:12 2018] )
[Fri Aug 3 00:18:12 2018] scsi host13: Fusion MPT SAS Host
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: sending port enable !!
[Fri Aug 3 00:18:12 2018] mpt3sas_cm4: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (528262416 kB)
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: host_add: handle(0x0001), sas_addr(0x500605b00c482a80), phys(8)
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: expander_add: handle(0x0009), parent(0x0001), sas_addr(0x5003048017aed57f), phys(38)
[Fri Aug 3 00:18:12 2018] scsi 13:0:0:0: Direct-Access SEAGATE ST800FM0173 0007 PQ: 0 ANSI: 6
Quando Mpt3sas envia "diag reset" significa que estou perdendo um jbod "90 disk" ao mesmo tempo!
E por causa disso, um simples disco defeituoso pode suspender meu Pool do ZFS.
Agora estou procurando uma solução e acho que se eu disser multipath; "não remapear se um disco falhar 3 vezes", então meu problema será resolvido porque o disco não estará sendo usado pelo pool e se meu pool não usar o disco defeituoso, o disco não poderá causar erro de E / S.
Portanto, com uma explicação simples, estou procurando uma maneira de desativar o uso do disco com falha.
Encontrei algumas configurações para /etc/multipath.conf
Mas não tenho certeza se isso resolverá meu problema ou não.
Você pode me dizer a melhor solução para o meu problema?
defaults {
user_friendly_names no
path_grouping_policy failover
polling_interval 10
path_selector "round-robin 0"
path_grouping_policy failover
path_checker readsector0
failback manual
no_path_retry 3
prio rdac
}
blacklist_exceptions {
property "(ID_WWN|SCSI_IDENT_.*|ID_SERIAL)"
}
Este é o log DMESG completo - > link