Problemas com o adaptador ethernet tg3 após a atualização 16.04

3

Depois de fazer uma instalação limpa do 16.04 Server no servidor HP ProLiant MicroServer Gen8, tenho problemas com desconexões aleatórias da rede.

O servidor funcionará bem entre algumas horas e mais de uma semana. Em algum momento, ele será desconectado da rede. O syslog só mostra a mensagem abaixo quando isso acontece.

Jul 12 22:46:11 gil kernel: [210256.898076] tg3 0000:03:00.0 eno1: Link is down

Não ajuda a desconectar e reconectar o cabo de rede. Eu tentei outra porta no switch também.

O servidor estava rodando estável com o 14.04 antes, então eu suspeito que isso seja um bug no driver tg3 com o kernel 4.4.

ethtool:

ole@gil:~$ sudo ethtool eno1
Settings for eno1:
    Supported ports: [ TP ]
    Supported link modes:   10baseT/Half 10baseT/Full
                            100baseT/Half 100baseT/Full
                            1000baseT/Half 1000baseT/Full
    Supported pause frame use: No
    Supports auto-negotiation: Yes
    Advertised link modes:  10baseT/Half 10baseT/Full
                            100baseT/Half 100baseT/Full
                            1000baseT/Half 1000baseT/Full
    Advertised pause frame use: Symmetric
    Advertised auto-negotiation: Yes
    Link partner advertised link modes:  10baseT/Half 10baseT/Full
                                         100baseT/Half 100baseT/Full
                                         1000baseT/Half 1000baseT/Full
    Link partner advertised pause frame use: Symmetric Receive-only
    Link partner advertised auto-negotiation: Yes
    Speed: 1000Mb/s
    Duplex: Full
    Port: Twisted Pair
    PHYAD: 1
    Transceiver: internal
    Auto-negotiation: on
    MDI-X: off
    Supports Wake-on: g
    Wake-on: g
    Current message level: 0x000000ff (255)
                           drv probe link timer ifdown ifup rx_err tx_err
    Link detected: yes

ip link show

ole@gil:~$ ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether b0:5a:da:87:43:80 brd ff:ff:ff:ff:ff:ff
3: eno2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether b0:5a:da:87:43:81 brd ff:ff:ff:ff:ff:ff

dmesg

ole@gil:~$ dmesg | grep tg3    
[    5.341202] tg3.c:v3.137 (May 11, 2014)
[    5.441154] tg3 0000:03:00.0 eth0: Tigon3 [partno(N/A) rev 5720000] (PCI Express) MAC address b0:5a:da:87:43:80
[    5.483079] tg3 0000:03:00.0 eth0: attached PHY is 5720C (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[1])
[    5.591514] tg3 0000:03:00.0 eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1]
[    5.634705] tg3 0000:03:00.0 eth0: dma_rwctrl[00000001] dma_mask[64-bit]
[    5.685464] tg3 0000:03:00.1 eth1: Tigon3 [partno(N/A) rev 5720000] (PCI Express) MAC address b0:5a:da:87:43:81
[    5.769032] tg3 0000:03:00.1 eth1: attached PHY is 5720C (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[1])
[    5.809242] tg3 0000:03:00.1 eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1]
[    5.851124] tg3 0000:03:00.1 eth1: dma_rwctrl[00000001] dma_mask[64-bit]
[    5.873733] tg3 0000:03:00.0 eno1: renamed from eth0
[    6.577027] tg3 0000:03:00.1 eno2: renamed from eth1
[   18.700979] tg3 0000:03:00.0 eno1: Link is up at 1000 Mbps, full duplex
[   18.700982] tg3 0000:03:00.0 eno1: Flow control is on for TX and on for RX
[   18.700983] tg3 0000:03:00.0 eno1: EEE is disabled

Alguma dica sobre como resolver isso? Eu preferiria não ter que rebaixar para 14.04.

ATUALIZAÇÃO: Percebeu a seguinte nova entrada no kern.log após o último defeito:

Jul 28 01:46:23 gil kernel: [709412.700133] NMI: PCI system error (SERR) for reason b1 on CPU 0.
Jul 28 01:46:23 gil kernel: [709412.700998] Dazed and confused, but trying to continue
Jul 28 01:46:35 gil kernel: [709424.063839] tg3 0000:03:00.0 eno1: Link is down

Alguma ideia do que poderia estar causando isso? Nunca vi nada assim com 14.04.

    
por wolle 13.07.2016 / 09:17

0 respostas