sem conexão de rede após a reinicialização - be2net: FW não está respondendo

0

Estou usando um processador AMD Threadripper 1920X rodando em uma placa-mãe Gigabyte Designare X399. Além disso, possui uma NIC Emulex OneConnect 10Gb. Esse NIC funciona bem até eu reiniciar o computador: O processo de inicialização demora mais do que o normal e a Interface aparece, recebe seu endereço IP estático do Network Manager, mas não consegue enviar / receber tráfego de rede. Se eu grep no syslog para o be2net, esta é a saída, o erro vem no final:

May 15 09:55:15 workstation kernel: [    1.057712] be2net 0000:07:00.0: be2net version is 11.4.0.0
May 15 09:55:15 workstation kernel: [    1.058031] be2net 0000:07:00.0: PCIe error reporting enabled
May 15 09:55:15 workstation kernel: [    1.404059] be2net 0000:07:00.0: FW config: function_mode=0x3, function_caps=0x7
May 15 09:55:15 workstation kernel: [    1.544087] be2net 0000:07:00.0: Max: txqs 1, rxqs 5, rss 4, eqs 16, vfs 0
May 15 09:55:15 workstation kernel: [    1.544089] be2net 0000:07:00.0: Max: uc-macs 30, mc-macs 64, vlans 64
May 15 09:55:15 workstation kernel: [    1.544133] be2net 0000:07:00.0: enabled 1 MSI-x vector(s) for NIC
May 15 09:55:15 workstation kernel: [    1.688095] be2net 0000:07:00.0: created 1 TX queue(s)
May 15 09:55:15 workstation kernel: [    1.712087] be2net 0000:07:00.0: created 1 RX queue(s)
May 15 09:55:15 workstation kernel: [    1.799038] be2net 0000:07:00.0: FW version is 4.9.416.2
May 15 09:55:15 workstation kernel: [    1.805191] be2net 0000:07:00.0: HW Flow control - TX:1 RX:1
May 15 09:55:15 workstation kernel: [    1.813392] be2net 0000:07:00.0: Adapter does not support HW error recovery
May 15 09:55:15 workstation kernel: [    1.813522] be2net 0000:07:00.0: Emulex OneConnect: PF  port 1
May 15 09:55:15 workstation kernel: [    1.813595] be2net 0000:07:00.1: be2net version is 11.4.0.0
May 15 09:55:15 workstation kernel: [    1.813855] be2net 0000:07:00.1: PCIe error reporting enabled
May 15 09:55:15 workstation kernel: [    2.064095] be2net 0000:07:00.1: FW config: function_mode=0x3, function_caps=0x7
May 15 09:55:15 workstation kernel: [    2.188096] be2net 0000:07:00.1: Max: txqs 1, rxqs 5, rss 4, eqs 16, vfs 0
May 15 09:55:15 workstation kernel: [    2.188098] be2net 0000:07:00.1: Max: uc-macs 30, mc-macs 64, vlans 64
May 15 09:55:15 workstation kernel: [    2.188148] be2net 0000:07:00.1: enabled 1 MSI-x vector(s) for NIC
May 15 09:55:15 workstation kernel: [    2.296098] be2net 0000:07:00.1: created 1 TX queue(s)
May 15 09:55:15 workstation kernel: [    2.320099] be2net 0000:07:00.1: created 1 RX queue(s)
May 15 09:55:15 workstation kernel: [    2.430614] be2net 0000:07:00.1: FW version is 4.9.416.2
May 15 09:55:15 workstation kernel: [    2.437790] be2net 0000:07:00.1: HW Flow control - TX:1 RX:1
May 15 09:55:15 workstation kernel: [    2.445992] be2net 0000:07:00.1: Adapter does not support HW error recovery
May 15 09:55:15 workstation kernel: [    2.446111] be2net 0000:07:00.1: Emulex OneConnect: PF  port 2
May 15 09:55:15 workstation kernel: [    2.447419] be2net 0000:07:00.0 enp7s0f0: renamed from eth0
May 15 09:55:15 workstation kernel: [    2.464282] be2net 0000:07:00.1 enp7s0f1: renamed from eth1
May 15 09:55:15 workstation sensors[1628]: be2net-pci-0700
May 15 09:55:15 workstation sensors[1628]: be2net-pci-0701
May 15 09:55:16 workstation kernel: [   10.483582] be2net 0000:07:00.0 enp7s0f0: Link is Up
May 15 09:55:16 workstation kernel: [   10.502938] be2net 0000:07:00.0 enp7s0f0: Link is Up
May 15 09:55:16 workstation kernel: [   10.573081] be2net 0000:07:00.1 enp7s0f1: Link is Down
May 15 09:55:16 workstation kernel: [   10.594631] be2net 0000:07:00.1 enp7s0f1: Link is Down
May 15 09:57:04 workstation kernel: [  118.627524] be2net 0000:07:00.1: FW not responding
May 15 09:57:04 workstation kernel: [  118.627530] be2net 0000:07:00.1: enp7s0f1: Link down
May 15 09:57:04 workstation kernel: [  118.678167] be2net 0000:07:00.1: did not receive flush compl
May 15 09:58:01 workstation kernel: [  176.252105] be2net 0000:07:00.0: FW not responding
May 15 09:58:01 workstation kernel: [  176.252113] be2net 0000:07:00.0: enp7s0f0: Link down
May 15 09:58:01 workstation kernel: [  176.262246] be2net 0000:07:00.0: txq0: cleaning 318 pending tx-wrbs
May 15 09:58:01 workstation kernel: [  176.313025] be2net 0000:07:00.0: did not receive flush compl
May 15 09:58:11 workstation kernel: [  186.333781] be2net 0000:07:00.0: be2net version is 11.4.0.0

Encontrei esta página a partir de 2015 , onde alguém afirma, que be2net quebra quando a memória dma_alloc_coherent não é zerado. Isto parece coincidir com as minhas observações, como a memória não contém dados após um novo começo de desligado, enquanto uma reinicialização não limpa minha RAM e assim o driver tem dados na memória recebida do kernel, expõe-na ao cartão e às placas falhas de firmware.

Além disso, de alguma forma parece relacionado ao IOMMU. Então, eu tentei parâmetros de kernel diferentes sem sucesso:

iommu=fullflush amd_iommu=fullflush disable_ddw

Espero que alguns tenham uma ideia, o que posso fazer para corrigir isso. Eu tentei zerar minha RAM com "sdmem -ll" no desligamento, mas a memória do driver é alocada em um nível tão baixo, que eu não posso zerar dessa maneira.

    
por Oliver R. 15.05.2018 / 12:24

0 respostas