Adaptec 5805 no erro Centos 6 IRQ 16

2

Estou tendo um problema com o cartão RAID Adaptec 5805

link

(com dois discos SAS em raid) e placa-mãe Gigabyte GA-H67A-D3H-B3

link

executando o CENTOS 6 como servidor web.

Breve história: quando eu inicializo o servidor, o cartão de ataque é executado em velocidade máxima, com taxa de transferência acima de 250Mb / s. Dentro de no máximo 60 minutos, recebo um erro de IRQ, o IRQ 16 é interrompido e, desde então, o cartão não faz mais que 2,5Mb / s de taxa de transferência (mas funciona). Eu preciso consertar isso, então o cartão roda a todo o momento.

Longa história:

1] a placa-mãe não tem slot PCIe x8 para caber na placa RAID. Eu tentei o slot x16, mas quando neste slot, a placa não é detectada de todo, o sistema é inicializado sem ele. Então eu usei x4 slot, onde o cartão (surpreendentemente para mim), funciona muito bem. Exceto o IRQ ...

2] há dois discos SATA conectados à placa-mãe, cada um como primário em seu canal

SAMSUNG HD502HJ SAMSUNG HD103UJ

então, há uma placa de rede adicional no primeiro dos slots PCI normais (na imagem no link acima, é o slot PCI branco mais à direita ao lado da descrição de "DUAL BOOT" na mobo.

E a placa de ataque está no slot PCIeX4 (ao lado desses três slots PCI brancos)

Nada mais é usado, eu não uso dispositivos USB ou qualquer outra coisa, apenas dois discos SATA, dois conectores de rede (mobo e cartão) e placa RAID com dois discos SAS conectados

3] sistema é como eu disse Centos 6

uname -a

Linux 2.6.32-71.29.1.el6.x86_64 #1 SMP Mon Jun 27 19:49:27 BST 2011 x86_64 x86_64 x86_64 GNU/Linux

CPU é

Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz

lspci -v

00:00.0 Host bridge: Intel Corporation Sandy Bridge DRAM Controller (rev 09)
    Flags: bus master, fast devsel, latency 0
    Capabilities: [e0] Vendor Specific Information <?>

00:02.0 VGA compatible controller: Intel Corporation Sandy Bridge Integrated Graphics Controller (rev 09) (prog-if 00 [VGA controller])
    Subsystem: Giga-byte Technology Device d000
    Flags: bus master, fast devsel, latency 0, IRQ 10
    Memory at fb400000 (64-bit, non-prefetchable) [size=4M]
    Memory at e0000000 (64-bit, prefetchable) [size=256M]
    I/O ports at ff00 [size=64]
    Expansion ROM at <unassigned> [disabled]
    Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit-
    Capabilities: [d0] Power Management version 2
    Capabilities: [a4] PCI Advanced Features

00:16.0 Communication controller: Intel Corporation Cougar Point HECI Controller #1 (rev 04)
    Subsystem: Giga-byte Technology Device 1c3a
    Flags: bus master, fast devsel, latency 0, IRQ 10
    Memory at fbfff000 (64-bit, non-prefetchable) [size=16]
    Capabilities: [50] Power Management version 3
    Capabilities: [8c] MSI: Enable- Count=1/1 Maskable- 64bit+

00:1a.0 USB Controller: Intel Corporation Cougar Point USB Enhanced Host Controller #2 (rev 05) (prog-if 20 [EHCI])
    Subsystem: Giga-byte Technology Device 5006
    Flags: bus master, medium devsel, latency 0, IRQ 18
    Memory at fbffe000 (32-bit, non-prefetchable) [size=1K]
    Capabilities: [50] Power Management version 2
    Capabilities: [58] Debug port: BAR=1 offset=00a0
    Capabilities: [98] PCI Advanced Features
    Kernel driver in use: ehci_hcd

00:1c.0 PCI bridge: Intel Corporation Cougar Point PCI Express Root Port 1 (rev b5) (prog-if 00 [Normal decode])
    Flags: bus master, fast devsel, latency 0
    Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
    Memory behind bridge: fb800000-fbbfffff
    Prefetchable memory behind bridge: 00000000dc000000-00000000dc0fffff
    Capabilities: [40] Express Root Port (Slot+), MSI 00
    Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
    Capabilities: [90] Subsystem: Giga-byte Technology Device 5001
    Capabilities: [a0] Power Management version 2
    Kernel driver in use: pcieport
    Kernel modules: shpchp

00:1c.5 PCI bridge: Intel Corporation Cougar Point PCI Express Root Port 6 (rev b5) (prog-if 00 [Normal decode])
    Flags: bus master, fast devsel, latency 0
    Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
    I/O behind bridge: 0000d000-0000dfff
    Prefetchable memory behind bridge: 00000000fbd00000-00000000fbdfffff
    Capabilities: [40] Express Root Port (Slot+), MSI 00
    Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
    Capabilities: [90] Subsystem: Giga-byte Technology Device 5001
    Capabilities: [a0] Power Management version 2
    Kernel driver in use: pcieport
    Kernel modules: shpchp

00:1c.6 PCI bridge: Intel Corporation 82801 PCI Bridge (rev b5) (prog-if 01 [Subtractive decode])
    Flags: bus master, fast devsel, latency 0
    Bus: primary=00, secondary=03, subordinate=04, sec-latency=0
    I/O behind bridge: 0000e000-0000efff
    Memory behind bridge: fbc00000-fbcfffff
    Prefetchable memory behind bridge: 00000000dc100000-00000000dc1fffff
    Capabilities: [40] Express Root Port (Slot+), MSI 00
    Capabilities: [80] MSI: Enable- Count=1/1 Maskable- 64bit-
    Capabilities: [90] Subsystem: Giga-byte Technology Device 5001
    Capabilities: [a0] Power Management version 2

00:1c.7 PCI bridge: Intel Corporation Cougar Point PCI Express Root Port 8 (rev b5) (prog-if 00 [Normal decode])
    Flags: bus master, fast devsel, latency 0
    Bus: primary=00, secondary=05, subordinate=05, sec-latency=0
    Memory behind bridge: fbe00000-fbefffff
    Capabilities: [40] Express Root Port (Slot+), MSI 00
    Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
    Capabilities: [90] Subsystem: Giga-byte Technology Device 5001
    Capabilities: [a0] Power Management version 2
    Kernel driver in use: pcieport
    Kernel modules: shpchp

00:1d.0 USB Controller: Intel Corporation Cougar Point USB Enhanced Host Controller #1 (rev 05) (prog-if 20 [EHCI])
    Subsystem: Giga-byte Technology Device 5006
    Flags: bus master, medium devsel, latency 0, IRQ 23
    Memory at fbffd000 (32-bit, non-prefetchable) [size=1K]
    Capabilities: [50] Power Management version 2
    Capabilities: [58] Debug port: BAR=1 offset=00a0
    Capabilities: [98] PCI Advanced Features
    Kernel driver in use: ehci_hcd

00:1f.0 ISA bridge: Intel Corporation Cougar Point LPC Controller (rev 05)
    Subsystem: Giga-byte Technology Device 5001
    Flags: bus master, medium devsel, latency 0
    Capabilities: [e0] Vendor Specific Information <?>
    Kernel modules: iTCO_wdt

00:1f.2 IDE interface: Intel Corporation Cougar Point 4 port SATA IDE Controller (rev 05) (prog-if 8f [Master SecP SecO PriP PriO])
    Subsystem: Giga-byte Technology Device b002
    Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 19
    I/O ports at fe00 [size=8]
    I/O ports at fd00 [size=4]
    I/O ports at fc00 [size=8]
    I/O ports at fb00 [size=4]
    I/O ports at fa00 [size=16]
    I/O ports at f900 [size=16]
    Capabilities: [70] Power Management version 3
    Capabilities: [b0] PCI Advanced Features
    Kernel driver in use: ata_piix
    Kernel modules: ata_generic, pata_acpi, ata_piix

00:1f.3 SMBus: Intel Corporation Cougar Point SMBus Controller (rev 05)
    Subsystem: Giga-byte Technology Device 5001
    Flags: medium devsel, IRQ 18
    Memory at fbffc000 (64-bit, non-prefetchable) [size=256]
    I/O ports at 0500 [size=32]
    Kernel driver in use: i801_smbus
    Kernel modules: i2c-i801

00:1f.5 IDE interface: Intel Corporation Cougar Point 2 port SATA IDE Controller (rev 05) (prog-if 85 [Master SecO PriO])
    Subsystem: Giga-byte Technology Device b002
    Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 19
    I/O ports at f700 [size=8]
    I/O ports at f600 [size=4]
    I/O ports at f500 [size=8]
    I/O ports at f400 [size=4]
    I/O ports at f300 [size=16]
    I/O ports at f200 [size=16]
    Capabilities: [70] Power Management version 3
    Capabilities: [b0] PCI Advanced Features
    Kernel driver in use: ata_piix
    Kernel modules: ata_generic, pata_acpi, ata_piix

01:00.0 RAID bus controller: Adaptec AAC-RAID (rev 09)
    Subsystem: Adaptec ASR5805
    Flags: bus master, fast devsel, latency 0, IRQ 16
    Memory at fb800000 (64-bit, non-prefetchable) [size=2M]
    [virtual] Expansion ROM at dc000000 [disabled] [size=512K]
    Capabilities: [98] Power Management version 2
    Capabilities: [a0] MSI: Enable- Count=1/2 Maskable- 64bit+
    Capabilities: [d0] Express Endpoint, MSI 00
    Capabilities: [90] Vital Product Data
    Capabilities: [100] Advanced Error Reporting
    Kernel driver in use: aacraid
    Kernel modules: aacraid

02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 06)
    Subsystem: Giga-byte Technology GA-EP45-DS5 Motherboard
    Flags: bus master, fast devsel, latency 0, IRQ 32
    I/O ports at de00 [size=256]
    Memory at fbdff000 (64-bit, prefetchable) [size=4K]
    Memory at fbdf8000 (64-bit, prefetchable) [size=16K]
    Capabilities: [40] Power Management version 3
    Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
    Capabilities: [70] Express Endpoint, MSI 01
    Capabilities: [b0] MSI-X: Enable- Count=4 Masked-
    Capabilities: [d0] Vital Product Data
    Capabilities: [100] Advanced Error Reporting
    Capabilities: [140] Virtual Channel <?>
    Capabilities: [160] Device Serial Number 01-00-00-00-68-4c-e0-00
    Kernel driver in use: r8169
    Kernel modules: r8169

03:00.0 PCI bridge: Integrated Technology Express, Inc. Device 8892 (rev 30) (prog-if 01 [Subtractive decode])
    Flags: bus master, fast devsel, latency 0
    Bus: primary=03, secondary=04, subordinate=04, sec-latency=32
    I/O behind bridge: 0000e000-0000efff
    Memory behind bridge: fbc00000-fbcfffff
    Prefetchable memory behind bridge: 00000000dc100000-00000000dc1fffff
    Capabilities: [90] Power Management version 2
    Capabilities: [a0] Subsystem: Giga-byte Technology Device 5000

04:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet (rev 10)
    Subsystem: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet
    Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 18
    I/O ports at ee00 [size=256]
    Memory at fbcff000 (32-bit, non-prefetchable) [size=256]
    [virtual] Expansion ROM at dc100000 [disabled] [size=64K]
    Capabilities: [dc] Power Management version 2
    Kernel driver in use: r8169
    Kernel modules: r8169

05:00.0 USB Controller: Device 1b6f:7023 (rev 01) (prog-if 30)
    Subsystem: Device 1b6f:7023
    Flags: bus master, fast devsel, latency 0, IRQ 11
    Memory at fbef8000 (64-bit, non-prefetchable) [size=32K]
    Capabilities: [50] Power Management version 3
    Capabilities: [70] MSI: Enable- Count=1/4 Maskable+ 64bit+
    Capabilities: [a0] Express Endpoint, MSI 00
    Capabilities: [100] Advanced Error Reporting
    Capabilities: [190] Device Serial Number 01-01-01-01-01-01-01-01

lspci -vv

01:00.0 RAID bus controller: Adaptec AAC-RAID (rev 09)
Subsystem: Adaptec ASR5805
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 4 bytes
Interrupt: pin A routed to IRQ 16
Region 0: Memory at fb800000 (64-bit, non-prefetchable) [size=2M]
[virtual] Expansion ROM at dc000000 [disabled] [size=512K]
Capabilities: [98] Power Management version 2
    Flags: PMEClk- DSI- D1+ D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
    Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [a0] MSI: Enable- Count=1/2 Maskable- 64bit+
    Address: 0000000000000000  Data: 0000
Capabilities: [d0] Express (v1) Endpoint, MSI 00
    DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 <1us
        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
    DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
        MaxPayload 128 bytes, MaxReadReq 512 bytes
    DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
    LnkCap: Port #0, Speed 2.5GT/s, Width x8, ASPM L0s, Latency L0 <128ns, L1 unlimited
        ClockPM- Surprise- LLActRep- BwNot-
    LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
    LnkSta: Speed 2.5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
Capabilities: [90] Vital Product Data
    Unknown small resource type 00, will not decode more.
Capabilities: [100] Advanced Error Reporting
    UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
    UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
    UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
    CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
    CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
    AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
Kernel driver in use: aacraid
Kernel modules: aacraid

cat / proc / interrupts

       CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       CPU6       CPU7
0:        128          0          0          0          0          0          0          0   IO-APIC-edge      timer
1:        105          0        606       4366          0          0          0          0   IO-APIC-edge      i8042
8:          1          0          0          0          0          0          0          0   IO-APIC-edge      rtc0
9:          0          0          0          0          0          0          0          0   IO-APIC-fasteoi   acpi
16:       1381          0     197881        730          0          0          0          9   IO-APIC-fasteoi   aacraid
18:       1695          0          0          0      13372   60347990          0          0   IO-APIC-fasteoi   ehci_hcd:usb1, eth1
19:       4637          0      14949    6352494          0          0          0     106473   IO-APIC-fasteoi   ata_piix, ata_piix
23:         33          0         27         12          0          0          0          0   IO-APIC-fasteoi   ehci_hcd:usb2
24:        291          0          0          0          0          0          0          0  HPET_MSI-edge      hpet2
25:          0          0          0          0          0          0          0          0  HPET_MSI-edge      hpet3
26:          0          0          0          0          0          0          0          0  HPET_MSI-edge      hpet4
27:          0          0          0          0          0          0          0          0  HPET_MSI-edge      hpet5
28:          0          0          0          0          0          0          0          0  HPET_MSI-edge      hpet6
32:       1275          0          0          0          0       1905   21317086          0   PCI-MSI-edge      eth0
NMI:       1873      10150       1974       1672        702       3046       1825        780   Non-maskable interrupts
LOC:   17501877   13611350   13868117    3612581    1520650    1850972    8633075    1486682   Local timer interrupts
SPU:          0          0          0          0          0          0          0          0   Spurious interrupts
PMI:          0          0          0          0          0          0          0          0   Performance monitoring interrupts
PND:          0          0          0          0          0          0          0          0   Performance pending work
RES:       5238      34250      12858       4299       1555       4833       5663       2485   Rescheduling interrupts
CAL:        334        302        429        414        421        464        465        468   Function call interrupts
TLB:       7863     154723      12147      11152      14099      33766      42580      11065   TLB shootdowns
TRM:          0          0          0          0          0          0          0          0   Thermal event interrupts
THR:          0          0          0          0          0          0          0          0   Threshold APIC interrupts
MCE:          0          0          0          0          0          0          0          0   Machine check exceptions
MCP:        293        293        293        293        293        293        293        293   Machine check polls
ERR:          7
MIS:          0

o módulo usado é o módulo do kernel kmod-aacraid do elrepo para o Centos 6

Installed Packages
Name       : kmod-aacraid
Arch       : x86_64
Version    : 1.1.7
Release    : 1.el6.elrepo
Size       : 340 k
Repo       : installed
From repo  : elrepo
Summary    : aacraid kernel module(s)
URL        : http://www.adaptec.com/
License    : GPLv2
Description: This package provides the aacraid kernel module(s) built
       : for the Linux kernel using the x86_64 family of processors.

e o erro do log

Dec 15 14:02:33 kernel: irq 16: nobody cared (try booting with the "irqpoll" option)
Dec 15 14:02:33 kernel: Pid: 0, comm: swapper Not tainted 2.6.32-71.29.1.el6.x86_64 #1
Dec 15 14:02:33 kernel: Call Trace:
Dec 15 14:02:33 kernel: <IRQ>  [<ffffffff810da96b>] __report_bad_irq+0x2b/0xa0
Dec 15 14:02:33 kernel: [<ffffffff810dab6c>] note_interrupt+0x18c/0x1d0
Dec 15 14:02:33 kernel: [<ffffffff810db255>] handle_fasteoi_irq+0xc5/0xf0
Dec 15 14:02:33 kernel: [<ffffffff81015fb9>] handle_irq+0x49/0xa0
Dec 15 14:02:33 kernel: [<ffffffff814d093c>] do_IRQ+0x6c/0xf0
Dec 15 14:02:33 kernel: [<ffffffff81013ad3>] ret_from_intr+0x0/0x11
Dec 15 14:02:33 kernel: <EOI>  [<ffffffff812da962>] ? acpi_idle_enter_c1+0xa3/0xc1
Dec 15 14:02:33 kernel: [<ffffffff812da941>] ? acpi_idle_enter_c1+0x82/0xc1
Dec 15 14:02:33 kernel: [<ffffffff813df687>] cpuidle_idle_call+0xa7/0x140
Dec 15 14:02:33 kernel: [<ffffffff81011e96>] cpu_idle+0xb6/0x110
Dec 15 14:02:33 kernel: [<ffffffff814c27d8>] start_secondary+0x1fc/0x23f
Dec 15 14:02:33 kernel: handlers:
Dec 15 14:02:33 kernel: [<ffffffffa002a590>] (aac_rx_intr_message+0x0/0xc0 [aacraid])
Dec 15 14:02:33 kernel: Disabling IRQ #16

Eu não vejo nenhum conflito de IRQ 16, a opção irqpoll sugerida não muda nada. Eu não preciso de USB, então eu posso desativá-lo, mas o sistema é de produção, então eu quero saber, onde está o problema, antes de eu começar a mexer com o BIOS ou qualquer outra coisa (e eu também preciso reduzir o tempo de inatividade tanto quanto possível).

Alguém pode me ajudar com o diagnóstico do problema aqui?

    
por Radek 16.12.2011 / 14:17

0 respostas