rsyslog descarta mensagens excessivamente

5

Temos uma máquina CentOS atuando como nosso servidor syslog. Eu notei recentemente que a máquina tem baixado um grande número de mensagens syslog às vezes. Temos mais de 250 roteadores e switches, bem como firewalls registrando todas as conexões abertas / fechadas.

Eu posso ver as mensagens chegarem ao NIC ethernet ... por exemplo, suponha que eu entre no modo de configuração em um dos nossos roteadores e então saia ... isso gera uma mensagem syslog, e eu farejo para demonstrar que ela fica para o servidor syslog ...

[mpenning@uglylogger net]$ sudo tshark -V udp and port 514 and host 214.110.12.246
Running as user "root" and group "root". This could be dangerous.
Capturing on eth0
Frame 1 (151 bytes on wire, 151 bytes captured)
    Arrival Time: Apr 26, 2013 06:05:33.490721000
    [Time delta from previous captured frame: 0.000000000 seconds]
    [Time delta from previous displayed frame: 0.000000000 seconds]
    [Time since reference or first frame: 0.000000000 seconds]
    Frame Number: 1
    Frame Length: 151 bytes
    Capture Length: 151 bytes
    [Frame is marked: False]
    [Protocols in frame: eth:ip:udp:syslog]
Ethernet II, Src: 50:57:a8:9f:68:3c (50:57:a8:9f:68:3c), Dst: Supermic_30:4a:07 (00:30:48:30:4a:07)
    Destination: Supermic_30:4a:07 (00:30:48:30:4a:07)
        Address: Supermic_30:4a:07 (00:30:48:30:4a:07)
        .... ...0 .... .... .... .... = IG bit: Individual address (unicast)
        .... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
    Source: 50:57:a8:9f:68:3c (50:57:a8:9f:68:3c)
        Address: 50:57:a8:9f:68:3c (50:57:a8:9f:68:3c)
        .... ...0 .... .... .... .... = IG bit: Individual address (unicast)
        .... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
    Type: IP (0x0800)
Internet Protocol, Src: 214.110.12.246 (214.110.12.246), Dst: 214.110.16.4 (214.110.16.4)
    Version: 4
    Header length: 20 bytes
    Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00)
        0000 00.. = Differentiated Services Codepoint: Default (0x00)
        .... ..0. = ECN-Capable Transport (ECT): 0
        .... ...0 = ECN-CE: 0
    Total Length: 137
    Identification: 0x0096 (150)
    Flags: 0x00
        0.. = Reserved bit: Not Set
        .0. = Don't fragment: Not Set
        ..0 = More fragments: Not Set
    Fragment offset: 0
    Time to live: 250
    Protocol: UDP (0x11)
    Header checksum: 0xcf52 [correct]
        [Good: True]
        [Bad : False]
    Source: 214.110.12.246 (214.110.12.246)
    Destination: 214.110.16.4 (214.110.16.4)
User Datagram Protocol, Src Port: 63873 (63873), Dst Port: syslog (514)
    Source port: 63873 (63873)
    Destination port: syslog (514)
    Length: 117
    Checksum: 0x1ed8 [validation disabled]
        [Good Checksum: False]
        [Bad Checksum: False]
Syslog message: LOCAL7.NOTICE: 187: .Apr 26 06:05:32.476 CDT: %SYS-5-CONFIG_I: Configured from console by admin on vty0 (214.110.26.29)
    1011 1... = Facility: LOCAL7 - reserved for local use (23)
    .... .101 = Level: NOTICE - normal but significant condition (5)
    Message: 187: .Apr 26 06:05:32.476 CDT: %SYS-5-CONFIG_I: Configured from console by admin on vty0 (214.110.26.29)

^C1 packet captured
[mpenning@uglylogger net]$

No entanto, essa mensagem não aparece no syslog (mesmo que outra mensagem enviada às 05:37:05) ... Apenas uma de sete ou oito dessas mensagens aparece no arquivo de log ...

[mpenning@uglylogger net]$ pwd
/var/log/net
[mpenning@uglylogger net]$
[mpenning@uglylogger net]$ grep CONFIG network.log
Apr 26 05:08:59 somedevice.company.local 779476: Apr 26 05:08:58.604 
CDT: %SYS-5-CONFIG_I: Configured from console by cbutler on vty0 (214.110.31.65)
Apr 26 05:37:05 w9-idf-lab 182: .Apr 26 05:37:04.452 CDT: 
%SYS-5-CONFIG_I: Configured from console by admin on vty0 (214.110.26.29)
[mpenning@uglylogger net]$

Eu tentei muitas coisas para fazer este servidor log corretamente ...

  • Removidos todos os arquivos estranhos da partição
  • Verificou a NIC quanto a descartes / erros
  • Reloaded rsyslogd
  • Recarregou o sistema
  • yum update
  • Iniciou um servidor de nomes de armazenamento em cache
  • Iostat verificado (a carga é bastante razoável, veja abaixo)
  • Checked vmstat (carregue novamente, é bastante razoável)

Estou ficando sem ideias ... como posso parar este servidor de sangrar syslogs?

Saída MISC

Lista de diretórios

[mpenning@uglylogger net]$ ls -al
total 3292356
drwxr-xr-x   3 root root      20480 Apr 26 05:17 .
drwxr-xr-x. 13 root root       4096 Apr 21 03:39 ..
drwxr-xr-x   2 root root       4096 Apr 19 04:22 daily
-rw-r--r--   1 root root  185081020 Apr 26 06:21 fw1.log
-rw-r--r--   1 root root 1936049300 Apr 26 04:15 fw1.log-20130426
-rw-r--r--   1 root root    7629110 Apr 26 06:21 fw2.log
-rw-r--r--   1 root root  160792210 Apr 26 04:17 fw2.log-20130426
-rw-r--r--   1 root root     118327 Apr 26 06:20 network.log
-rw-r--r--   1 root root    1572210 Apr 26 04:17 network.log-20130426
-rw-r--r--   1 root root      73108 Apr 26 06:21 wireless.log
-rw-r--r--   1 root root     686104 Apr 26 04:17 wireless.log-20130426
[mpenning@uglylogger net]$

rsyslog.conf

$ModLoad imudp
$UDPServerRun 514


$ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat

$IncludeConfig /etc/rsyslog.d/*.conf

if $msg contains 'APF-3-RCV_UNSUPP_MSG' then /dev/null
&~

if $msg contains 'FWSM-2-106007' then /dev/null
&~


if $fromhost-ip=='214.110.225.201' then /var/log/net/fw1.log
&~
if $fromhost-ip=='214.110.225.202' then /var/log/net/fw1.log
&~

if $fromhost-ip=='214.110.19.246' then /var/log/net/fw2.log
&~
if $fromhost-ip=='214.110.19.253' then /var/log/net/fw2.log
&~

if $fromhost-ip=='214.110.5.10' then /var/log/net/fw_other.log
&~

if $fromhost-ip=='214.110.6.254' then /var/log/net/fw_other.log
&~
if $fromhost-ip=='214.110.6.253' then /var/log/net/fw_other.log
&~

if $fromhost-ip=='214.110.46.254' then /var/log/net/fw_other.log
&~
if $fromhost-ip=='214.110.46.253' then /var/log/net/fw_other.log
&~

if $fromhost-ip=='214.110.19.42' then /var/log/net/fw_other.log
&~

if $fromhost-ip=='214.110.3.203' then /var/log/net/fw_other.log
&~

if $fromhost-ip=='214.110.84.2' then @214.110.16.6
if $fromhost-ip=='214.110.84.2' then /var/log/net/wireless.log
&~
if $fromhost-ip=='214.110.84.4' then @214.110.16.6
if $fromhost-ip=='214.110.84.4' then /var/log/net/wireless.log
&~
if $fromhost-ip=='214.110.70.8' then @214.110.16.6
if $fromhost-ip=='214.110.70.8' then /var/log/net/wireless.log
&~
if $fromhost-ip=='214.110.70.12' then @214.110.16.6
if $fromhost-ip=='214.110.70.12' then /var/log/net/wireless.log
&~
if $fromhost-ip=='214.110.70.16' then @214.110.16.6
if $fromhost-ip=='214.110.70.16' then /var/log/net/wireless.log
&~
if $fromhost-ip=='214.110.16.6' then /var/log/net/wireless.log
&~

if $fromhost-ip startswith '214.110' then /var/log/net/network.log
&~
if $fromhost-ip startswith '214.111' then /var/log/net/network.log
&~
if $fromhost-ip startswith '214.112' then /var/log/net/network.log
&~


*.info;mail.none;authpriv.none;cron.none                /var/log/messages

authpriv.*                                              /var/log/secure

mail.*                                                  -/var/log/maillog


cron.*                                                  /var/log/cron

*.emerg                                                 *

uucp,news.crit                                          /var/log/spooler

local7.*                                                /var/log/boot.log

$template SpiceTmpl,"%TIMESTAMP%.%TIMESTAMP:::date-subseconds% %syslogtag% %syslogseverity-text%:%msg:::sp-if-no-1st-sp%%msg:::drop-last-lf%\n"

:programname, startswith, "spice-vdagent"       /var/log/spice-vdagent.log;SpiceTmpl

iostat 5 ... esses números são típicos ... a carga não aumenta muito ...

[mpenning@uglylogger net]$ iostat 5
Linux 2.6.32-358.2.1.el6.i686 (uglylogger.local)  04/26/2013      _i686_  (4 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           2.25    0.05    0.80    0.37    0.00   96.52

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda              10.67       364.09       536.25  995468936 1466172104
dm-0             33.82        15.35       268.50   41981146  734110616
dm-1              0.00         0.00         0.00       3056          0
dm-2              2.21         0.05        17.60     132282   48114080
dm-3             33.07       348.68       250.14  953334658  683895008

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.05    0.00    0.00    0.15    0.00   99.80

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               1.00         0.00         8.00          0         40
dm-0              0.80         0.00         6.40          0         32
dm-1              0.00         0.00         0.00          0          0
dm-2              0.00         0.00         0.00          0          0
dm-3              0.20         0.00         1.60          0          8

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.05    0.30    0.00   99.65

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               1.00         0.00        33.60          0        168
dm-0              0.00         0.00         0.00          0          0
dm-1              0.00         0.00         0.00          0          0
dm-2              0.00         0.00         0.00          0          0
dm-3              4.20         0.00        33.60          0        168

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.05    0.00    0.30    0.20    0.00   98.44

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               1.20         0.00         8.00          0         40
dm-0              0.80         0.00         6.40          0         32
dm-1              0.00         0.00         0.00          0          0
dm-2              0.00         0.00         0.00          0          0
dm-3              0.20         0.00         1.60          0          8

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.40    0.00    0.30    0.15    0.00   98.14

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               0.80         0.00         6.40          0         32
dm-0              0.80         0.00         6.40          0         32
dm-1              0.00         0.00         0.00          0          0
dm-2              0.00         0.00         0.00          0          0
dm-3              0.00         0.00         0.00          0          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.05    0.00    0.05    0.25    0.00   99.65

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               5.40         0.00        46.40          0        232
dm-0              0.00         0.00         0.00          0          0
dm-1              0.00         0.00         0.00          0          0
dm-2              0.00         0.00         0.00          0          0
dm-3              5.80         0.00        46.40          0        232

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.05    0.15    0.00   99.80

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               0.80         0.00         4.80          0         24
dm-0              0.60         0.00         4.80          0         24
dm-1              0.00         0.00         0.00          0          0
dm-2              0.00         0.00         0.00          0          0
dm-3              0.00         0.00         0.00          0          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           3.71    0.00    1.05    0.10    0.00   95.14

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               1.20         0.00        19.20          0         96
dm-0              1.40         0.00        11.20          0         56
dm-1              0.00         0.00         0.00          0          0
dm-2              0.00         0.00         0.00          0          0
dm-3              1.00         0.00         8.00          0         40

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.05    0.00    0.00    0.35    0.00   99.60

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               3.40         0.00       102.40          0        512
dm-0              1.40         0.00        11.20          0         56
dm-1              0.00         0.00         0.00          0          0
dm-2              0.00         0.00         0.00          0          0
dm-3             11.40         0.00        91.20          0        456

^C
[mpenning@uglylogger net]$

vmstat 5 ...

[mpenning@uglylogger net]$ vmstat 5
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  1      0 240408  50068 3506120    0    0    46    67    2    2  2  1 97  0  0
 0  0      0 240400  50068 3506152    0    0     0    19 1312   37  0  0 100  0  0
 1  0      0 239160  50084 3508648    0    0     0    18 3943 2320 20  6 74  0  0
 1  0      0 242632  50100 3510084    0    0     0   212 3561 2718 14  5 81  1  0
 0  0      0 238168  50116 3510336    0    0     0    58 1612 2537  1  1 98  0  0
 0  0      0 235440  50124 3510340    0    0     0   510 1340 1333  0  0 100  0  0
 0  0      0 235564  50132 3510340    0    0     0    19 1470   37  0  0 100  0  0
 0  0      0 235564  50140 3510344    0    0     0    28 1348   40  0  0 100  0  0
 0  0      0 235440  50156 3510340    0    0     0    17 1497   37  0  0 100  0  0
 0  0      0 235564  50156 3510348    0    0     0    70 1522   38  0  0 100  0  0

df -h ...

[mpenning@sasmars net]$ df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg_uglylogger-LogVol01
                       73G   17G   56G  23% /
tmpfs                 2.0G     0  2.0G   0% /dev/shm
/dev/sda1             985M  126M  809M  14% /boot
/dev/mapper/vg_uglylogger-LogVol00
                       20G  655M   18G   4% /home
/dev/mapper/vg_uglylogger-LogVol03
                      592G  4.9G  557G   1% /var
[mpenning@uglylogger net]$

netstat -s ...

[mpenning@uglylogger net]$ netstat -s | head
Ip:
    452407434 total packets received
    0 forwarded
    0 incoming packets discarded
    449256489 incoming packets delivered
    156718754 requests sent out
    150 reassemblies required
    75 packets reassembled ok
Icmp:
    2423436 ICMP messages received
[mpenning@uglylogger net]$

ifconfig eth0 ...

[mpenning@uglylogger net]$ ifconfig eth0 | grep -E "dropped|collisions"
          RX packets:466065257 errors:0 dropped:0 overruns:0 frame:0
          TX packets:252751047 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
[mpenning@uglylogger net]$

ethtool -S eth0|grep -vw 0 ...

[mpenning@uglylogger net]$ sudo ethtool -S eth0|grep -vw 0
NIC statistics:
     rx_packets: 5206242266
     tx_packets: 255859804
     rx_bytes: 1205016281115
     tx_bytes: 180592917566
     rx_broadcast: 15001736
     tx_broadcast: 2648
     rx_multicast: 7518499
     tx_multicast: 50236
     multicast: 7518499
     rx_no_buffer_count: 103
     tx_restart_queue: 1
     tx_tcp_seg_good: 10107428
     rx_long_byte_count: 1205016281115
     rx_csum_offload_good: 5189768707
     rx_csum_offload_errors: 5
[mpenning@uglylogger net]$

lspci ...

[mpenning@uglylogger net]$ sudo lspci
00:00.0 Host bridge: Intel Corporation E7501 Memory Controller Hub (rev 01)
00:00.1 Unassigned class [ff00]: Intel Corporation E7500/E7501 Host RASUM Controller (rev 01)
00:02.0 PCI bridge: Intel Corporation E7500/E7501 Hub Interface B PCI-to-PCI Bridge (rev 01)
00:03.0 PCI bridge: Intel Corporation E7500/E7501 Hub Interface C PCI-to-PCI Bridge (rev 01)
00:1d.0 USB controller: Intel Corporation 82801CA/CAM USB Controller #1 (rev 02)
00:1d.1 USB controller: Intel Corporation 82801CA/CAM USB Controller #2 (rev 02)
00:1d.2 USB controller: Intel Corporation 82801CA/CAM USB Controller #3 (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 42)
00:1f.0 ISA bridge: Intel Corporation 82801CA LPC Interface Controller (rev 02)
00:1f.1 IDE interface: Intel Corporation 82801CA Ultra ATA Storage Controller (rev 02)
00:1f.3 SMBus: Intel Corporation 82801CA/CAM SMBus Controller (rev 02)
01:1c.0 PIC: Intel Corporation 82870P2 P64H2 I/OxAPIC (rev 04)
01:1d.0 PCI bridge: Intel Corporation 82870P2 P64H2 Hub PCI Bridge (rev 04)
01:1e.0 PIC: Intel Corporation 82870P2 P64H2 I/OxAPIC (rev 04)
01:1f.0 PCI bridge: Intel Corporation 82870P2 P64H2 Hub PCI Bridge (rev 04)
02:01.0 RAID bus controller: 3ware Inc 7xxx/8xxx-series PATA/SATA-RAID (rev 01)
03:02.0 Ethernet controller: Intel Corporation 82546EB Gigabit Ethernet Controller (Copper) (rev 01)
03:02.1 Ethernet controller: Intel Corporation 82546EB Gigabit Ethernet Controller (Copper) (rev 01)
04:1c.0 PIC: Intel Corporation 82870P2 P64H2 I/OxAPIC (rev 04)
04:1d.0 PCI bridge: Intel Corporation 82870P2 P64H2 Hub PCI Bridge (rev 04)
04:1e.0 PIC: Intel Corporation 82870P2 P64H2 I/OxAPIC (rev 04)
04:1f.0 PCI bridge: Intel Corporation 82870P2 P64H2 Hub PCI Bridge (rev 04)
05:02.0 Serial controller: 3Com Corp, Modem Division 56K FaxModem Model 5610 (rev 01)
07:01.0 VGA compatible controller: Advanced Micro Devices [AMD] nee ATI Rage XL (rev 27)
[mpenning@uglylogger net]$

uname -a ...

[mpenning@uglylogger net]$ uname -a
Linux uglylogger.local 2.6.32-358.2.1.el6.i686 #1 SMP Tue Mar 12 21:42:46 UTC 2013 i686 i686 i386 GNU/Linux
[mpenning@uglylogger net]$
    
por Mike Pennington 26.04.2013 / 14:04

2 respostas

6

Atualize para a versão mais recente do rsyslog. Nós tivemos esse problema exato no trabalho, e essa é a única coisa que resolveu isso. A (s) versão (ões) anterior (es) tinham problemas com a resolução de nomes, e até mesmo desativá-la não resolveu. A ramificação 7.x resolve o problema. Vou ver se consigo encontrar o link específico.

    
por 29.04.2013 / 17:59
1

Eu suponho que o RedHat fez backport todos os problemas relacionados ao desempenho para a versão rsyslog contida no sistema operacional.

rx_no_buffer_count: 103 parece ser o problema principal aqui.

Isso diz que houve 103 pacotes TCP que foram descartados ANTES de serem passados para o sistema operacional.

A queda foi registrada na NIC (e não passou para o sistema operacional, portanto, netstat não mostrou isso). Para resolver o problema, você provavelmente terá que aumentar o tamanho do buffer do anel de recepção em suas configurações de NIC.

Faça um ethtool -g eth0 para ver suas configurações atuais e possíveis para o RX.

Da minha experiência, configurar o RX-ring-buffer para 2048 ou 3172 é muito bom.

Isso dará ao NIC tempo para armazenar em buffer os pacotes TCP recebidos até o hardware (isso requer uma interceptação de PCI) e o sistema operacional tem tempo para processar os pacotes.

Aqui está uma explicação de RedHat , o que está acontecendo aqui.

Para tornar essa alteração persistente, altere ETHTOOL_OPS como descrito aqui .

    
por 30.04.2013 / 09:55