Eu estou vendo uma alta taxa de erros de pacotes (quase todos os excessos) em ambos os 10gb NICs anexados ao meu servidor linux. O sistema está lidando com altos volumes de tráfego de rede SCTP (muito pouco TCP), então este é provavelmente um problema de ajuste do kernel do Linux.
No entanto, todos os parâmetros de ajuste que tentei até agora parecem estar tendo pouco efeito e ainda estou vendo altos volumes de excesso de pacotes. Qualquer ponte sobre outras coisas que eu poderia tentar obter o sistema de manipulação de pacotes eficientemente seria muito apreciado!
:~# ifconfig ens4f1
ens4f1 Link encap:Ethernet HWaddr 5c:b9:01:de:0d:4c
UP BROADCAST RUNNING PROMISC MULTICAST MTU:9000 Metric:1
RX packets:22313514162 errors:17598241316 dropped:68
overruns:17598241316 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:31767480894219 (31.7 TB) TX bytes:0 (0.0 B)
Interrupt:17 Memory:c9800000-c9ffffff
Detalhes do sistema:
SO: Ubuntu Linux (4.11.0-14-genérico # 20 ~ 16.04.1-Ubuntu SMP x86_64) Núcleos da CPU: 72 Modelo NIC: NetXtreme II BCM57810 10 Gigabit Ethernet RAM: 240 GiB
Estatísticas da amostra da NIC mostrando a taxa de erros do pacote:
for i in 'seq 1 10';do echo "$i) 'date'" - $(ifconfig ens4f0| egrep "RX"| egrep overruns;sleep 5);done
1) Thu Oct 12 19:50:40 SGT 2017 - RX packets:8364065830 errors:2594507718 dropped:215 overruns:2594507718 frame:0
2) Thu Oct 12 19:50:45 SGT 2017 - RX packets:8365336060 errors:2596662672 dropped:215 overruns:2596662672 frame:0
3) Thu Oct 12 19:50:50 SGT 2017 - RX packets:8366602087 errors:2598840959 dropped:215 overruns:2598840959 frame:0
4) Thu Oct 12 19:50:55 SGT 2017 - RX packets:8367881271 errors:2600989229 dropped:215 overruns:2600989229 frame:0
5) Thu Oct 12 19:51:01 SGT 2017 - RX packets:8369147536 errors:2603157030 dropped:215 overruns:2603157030 frame:0
6) Thu Oct 12 19:51:06 SGT 2017 - RX packets:8370149567 errors:2604904183 dropped:215 overruns:2604904183 frame:0
7) Thu Oct 12 19:51:11 SGT 2017 - RX packets:8371298018 errors:2607183939 dropped:215 overruns:2607183939 frame:0
8) Thu Oct 12 19:51:16 SGT 2017 - RX packets:8372455587 errors:2609411186 dropped:215 overruns:2609411186 frame:0
9) Thu Oct 12 19:51:21 SGT 2017 - RX packets:8373585102 errors:2611680597 dropped:215 overruns:2611680597 frame:0
10) Thu Oct 12 19:51:26 SGT 2017 - RX packets:8374678508 errors:2614053000 dropped:215 overruns:2614053000 frame:0
No entanto, a verificação (com tc) não mostra nenhum excesso de buffer de anel na NIC:
tc -s qdisc show dev ens4f0|egrep drop
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
Verificando as retransmissões tcp, a taxa é baixa:
for i in 'seq 1 10';do echo "'date'" - $(netstat -s | grep - i retransmited;sleep 2);done
Thu Oct 12 20:04:29 SGT 2017 - 10633 segments retransmited
Thu Oct 12 20:04:31 SGT 2017 - 10634 segments retransmited
Thu Oct 12 20:04:33 SGT 2017 - 10636 segments retransmited
Thu Oct 12 20:04:35 SGT 2017 - 10636 segments retransmited
Thu Oct 12 20:04:37 SGT 2017 - 10638 segments retransmited
Thu Oct 12 20:04:39 SGT 2017 - 10639 segments retransmited
Thu Oct 12 20:04:41 SGT 2017 - 10640 segments retransmited
Thu Oct 12 20:04:43 SGT 2017 - 10640 segments retransmited
Thu Oct 12 20:04:45 SGT 2017 - 10643 segments retransmited
O que eu tentei até agora:
Ajustando os parâmetros da NIC (união de pacotes, descarregamento, aumento de buffers de anel de NIC, etc ...):
ethtool -L ens4f0 combinado 30
ethtool -K ens4f0 está no rx no tx on sg on tso on
ethtool -C ens4f0 rx-usecs 96
ethtool -C ens4f0 adaptive-rx em
ethtool -G ens4f0 rx 4078 tx 4078
sysctl ajusta o kernel (principalmente aumentando os buffers tcp do kernel):
sysctl -w net.ipv4.tcp_low_latency = 1
sysctl -w net.ipv4.tcp_max_syn_backlog = 16384
sysctl -w net.core.optmem_max = 20480000
sysctl -w net.core.netdev_max_backlog = 5000000
sysctl -w net.ipv4.tcp_rmem="65536 1747600 83886080"
sysctl -w net.core.somaxconn = 1280
sysctl -w kernel.sched_min_granularity_ns = 10000000
sysctl -w kernel.sched_wakeup_granularity_ns = 15000000
sysctl -w net.ipv4.tcp_wmem="65536 1747600 83886080"
sysctl -w net.core.wmem_max = 2147483647
sysctl -w net.core.wmem_default = 2147483647
sysctl -w net.core.rmem_max = 2147483647
sysctl -w net.core.rmem_default = 2147483647
sysctl -w net.ipv4.tcp_congestion_control = cúbico
sysctl -w net.ipv4.tcp_rmem="163840 3495200 268754560"
sysctl -w net.ipv4.tcp_wmem="163840 3495200 268754560"
sysctl -w net.ipv4.udp_rmem_min="163840 3495200 268754560"
sysctl -w net.ipv4.udp_wmem_min="163840 3495200 268754560"
sysctl -w net.ipv4.tcp_mem="268754560 268754560 268754560"
sysctl -w net.ipv4.udp_mem="268754560 268754560 268754560"
sysctl -w net.ipv4.tcp_mtu_probing = 1
sysctl -w net.ipv4.tcp_slow_start_after_idle = 0
Resultados depois disso (aparentemente não muito):
:~# for i in 'seq 1 10';do echo "$i) 'date'" - $(ifconfig ens4f1| egrep "RX"| egrep overruns;sleep 5);done
1) Thu Oct 12 20:42:56 SGT 2017 - RX packets:16260617113 errors:10964865836 dropped:68 overruns:10964865836 frame:0
2) Thu Oct 12 20:43:01 SGT 2017 - RX packets:16263268608 errors:10969589847 dropped:68 overruns:10969589847 frame:0
3) Thu Oct 12 20:43:06 SGT 2017 - RX packets:16265869693 errors:10974489639 dropped:68 overruns:10974489639 frame:0
4) Thu Oct 12 20:43:11 SGT 2017 - RX packets:16268487078 errors:10979323070 dropped:68 overruns:10979323070 frame:0
5) Thu Oct 12 20:43:16 SGT 2017 - RX packets:16271098501 errors:10984193349 dropped:68 overruns:10984193349 frame:0
6) Thu Oct 12 20:43:21 SGT 2017 - RX packets:16273804004 errors:10988857622 dropped:68 overruns:10988857622 frame:0
7) Thu Oct 12 20:43:26 SGT 2017 - RX packets:16276493470 errors:10993340211 dropped:68 overruns:10993340211 frame:0
8) Thu Oct 12 20:43:31 SGT 2017 - RX packets:16278612090 errors:10997152436 dropped:68 overruns:10997152436 frame:0
9) Thu Oct 12 20:43:36 SGT 2017 - RX packets:16281253727 errors:11001834579 dropped:68 overruns:11001834579 frame:0
10) Thu Oct 12 20:43:41 SGT 2017 - RX packets:16283972622 errors:11006374277 dropped:68 overruns:11006374277 frame:0
Freak a CPU para um melhor desempenho:
cpufreq-set -r -g performance
Resultados (nada significativo):
:~# for i in 'seq 1 10';do echo "$i) 'date'" - $(ifconfig ens4f1| egrep "RX"| egrep overruns;sleep 5);done
1) Thu Oct 12 21:53:07 SGT 2017 - RX packets:18506492788 errors:14622639426 dropped:68 overruns:14622639426 frame:0
2) Thu Oct 12 21:53:12 SGT 2017 - RX packets:18509314581 errors:14626750641 dropped:68 overruns:14626750641 frame:0
3) Thu Oct 12 21:53:17 SGT 2017 - RX packets:18511485458 errors:14630268859 dropped:68 overruns:14630268859 frame:0
4) Thu Oct 12 21:53:22 SGT 2017 - RX packets:18514223562 errors:14634547845 dropped:68 overruns:14634547845 frame:0
5) Thu Oct 12 21:53:27 SGT 2017 - RX packets:18516926578 errors:14638745143 dropped:68 overruns:14638745143 frame:0
6) Thu Oct 12 21:53:32 SGT 2017 - RX packets:18519605412 errors:14642929021 dropped:68 overruns:14642929021 frame:0
7) Thu Oct 12 21:53:37 SGT 2017 - RX packets:18522523560 errors:14647108982 dropped:68 overruns:14647108982 frame:0
8) Thu Oct 12 21:53:42 SGT 2017 - RX packets:18525185869 errors:14651577286 dropped:68 overruns:14651577286 frame:0
9) Thu Oct 12 21:53:47 SGT 2017 - RX packets:18527947266 errors:14655961847 dropped:68 overruns:14655961847 frame:0
10) Thu Oct 12 21:53:52 SGT 2017 - RX packets:18530703288 errors:14659988398 dropped:68 overruns:14659988398 frame:0
Resultados usando sar:
:~# sar -n EDEV 5 3| egrep "(ens4f1|IFACE)"
11:17:43 PM IFACE rxerr/s txerr/s coll/s rxdrop/s txdrop/s txcarr/s rxfram/s rxfifo/s txfifo/s
11:17:48 PM ens4f1 360809.40 0.00 0.00 0.00 0.00 0.00 0.00 360809.40 0.00
11:17:53 PM ens4f1 382500.40 0.00 0.00 0.00 0.00
0.00 0.00 382500.40 0.00
11:17:58 PM ens4f1 353717.00 0.00 0.00 0.00 0.00
0.00 0.00 353717.00 0.00
Average: ens4f1 365675.60 0.00 0.00 0.00 0.00 0.00 0.00 365675.60 0.00
Também ajustei alguns parâmetros específicos do SCTP, mas também sem resultados:
sysctl -w net.core.rmem_max=900000000
sysctl -w net.core.wmem_max=900000000
sysctl -w net.sctp.sctp_mem="2100000000 2100000000 2100000000"
sysctl -w net.sctp.sctp_rmem="2100000000 2100000000 2100000000"
sysctl -w net.sctp.sctp_wmem="2100000000 2100000000 2100000000"
sysctl -w net.ipv4.udp_mem="5000000000 5000000000 5000000000"
sysctl -w net.ipv4.udp_mem="10000000000 10000000000 10000000000"
for i in 'seq 1 10';do echo "$i) 'date'" - $(ifconfig ens4f1| egrep "RX"| egrep overruns;sleep 5);done
1) Sat Oct 14 21:55:55 SGT 2017 - RX packets:84379103241 errors:56372972367 dropped:58 overruns:56372972367 frame:0
2) Sat Oct 14 21:56:00 SGT 2017 - RX packets:84381451420 errors:56377777944 dropped:58 overruns:56377777944 frame:0
3) Sat Oct 14 21:56:05 SGT 2017 - RX packets:84383737427 errors:56382434478 dropped:58 overruns:56382434478 frame:0
4) Sat Oct 14 21:56:10 SGT 2017 - RX packets:84386524128 errors:56386618268 dropped:58 overruns:56386618268 frame:0
5) Sat Oct 14 21:56:15 SGT 2017 - RX packets:84389578203 errors:56390512483 dropped:58 overruns:56390512483 frame:0
6) Sat Oct 14 21:56:20 SGT 2017 - RX packets:84392673120 errors:56394472475 dropped:58 overruns:56394472475 frame:0
7) Sat Oct 14 21:56:25 SGT 2017 - RX packets:84395714973 errors:56398573221 dropped:58 overruns:56398573221 frame:0
8) Sat Oct 14 21:56:30 SGT 2017 - RX packets:84398951451 errors:56402297479 dropped:58 overruns:56402297479 frame:0
9) Sat Oct 14 21:56:35 SGT 2017 - RX packets:84401039177 errors:56406013473 dropped:58 overruns:56406013473 frame:0
10) Sat Oct 14 21:56:40 SGT 2017 - RX packets:84403558097 errors:56410804379 dropped:58 overruns:56410804379 frame:0
Tags networking linux linux-kernel