Depois de fazer uma instalação limpa do 16.04 Server no servidor HP ProLiant MicroServer Gen8, tenho problemas com desconexões aleatórias da rede.
O servidor funcionará bem entre algumas horas e mais de uma semana. Em algum momento, ele será desconectado da rede. O syslog só mostra a mensagem abaixo quando isso acontece.
Jul 12 22:46:11 gil kernel: [210256.898076] tg3 0000:03:00.0 eno1: Link is down
Não ajuda a desconectar e reconectar o cabo de rede. Eu tentei outra porta no switch também.
O servidor estava rodando estável com o 14.04 antes, então eu suspeito que isso seja um bug no driver tg3 com o kernel 4.4.
ethtool:
ole@gil:~$ sudo ethtool eno1
Settings for eno1:
Supported ports: [ TP ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Half 1000baseT/Full
Supported pause frame use: No
Supports auto-negotiation: Yes
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Half 1000baseT/Full
Advertised pause frame use: Symmetric
Advertised auto-negotiation: Yes
Link partner advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Half 1000baseT/Full
Link partner advertised pause frame use: Symmetric Receive-only
Link partner advertised auto-negotiation: Yes
Speed: 1000Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 1
Transceiver: internal
Auto-negotiation: on
MDI-X: off
Supports Wake-on: g
Wake-on: g
Current message level: 0x000000ff (255)
drv probe link timer ifdown ifup rx_err tx_err
Link detected: yes
ip link show
ole@gil:~$ ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether b0:5a:da:87:43:80 brd ff:ff:ff:ff:ff:ff
3: eno2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether b0:5a:da:87:43:81 brd ff:ff:ff:ff:ff:ff
dmesg
ole@gil:~$ dmesg | grep tg3
[ 5.341202] tg3.c:v3.137 (May 11, 2014)
[ 5.441154] tg3 0000:03:00.0 eth0: Tigon3 [partno(N/A) rev 5720000] (PCI Express) MAC address b0:5a:da:87:43:80
[ 5.483079] tg3 0000:03:00.0 eth0: attached PHY is 5720C (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[1])
[ 5.591514] tg3 0000:03:00.0 eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1]
[ 5.634705] tg3 0000:03:00.0 eth0: dma_rwctrl[00000001] dma_mask[64-bit]
[ 5.685464] tg3 0000:03:00.1 eth1: Tigon3 [partno(N/A) rev 5720000] (PCI Express) MAC address b0:5a:da:87:43:81
[ 5.769032] tg3 0000:03:00.1 eth1: attached PHY is 5720C (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[1])
[ 5.809242] tg3 0000:03:00.1 eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1]
[ 5.851124] tg3 0000:03:00.1 eth1: dma_rwctrl[00000001] dma_mask[64-bit]
[ 5.873733] tg3 0000:03:00.0 eno1: renamed from eth0
[ 6.577027] tg3 0000:03:00.1 eno2: renamed from eth1
[ 18.700979] tg3 0000:03:00.0 eno1: Link is up at 1000 Mbps, full duplex
[ 18.700982] tg3 0000:03:00.0 eno1: Flow control is on for TX and on for RX
[ 18.700983] tg3 0000:03:00.0 eno1: EEE is disabled
Alguma dica sobre como resolver isso? Eu preferiria não ter que rebaixar para 14.04.
ATUALIZAÇÃO: Percebeu a seguinte nova entrada no kern.log após o último defeito:
Jul 28 01:46:23 gil kernel: [709412.700133] NMI: PCI system error (SERR) for reason b1 on CPU 0.
Jul 28 01:46:23 gil kernel: [709412.700998] Dazed and confused, but trying to continue
Jul 28 01:46:35 gil kernel: [709424.063839] tg3 0000:03:00.0 eno1: Link is down
Alguma ideia do que poderia estar causando isso? Nunca vi nada assim com 14.04.
Tags networking 16.04 ethernet