systemd Usando 4 GB de RAM após 18 dias de tempo de atividade

13

Eu tenho um servidor web rodando o CentOS 7, no qual o processo do systemd usa quase 4 GB de RAM após algumas semanas de tempo de atividade. O uso de RAM está aumentando constantemente em cerca de 200MB por dia. Este e outros processos relacionados, como o systemd-logind e o dbus-daemon, também usam uma quantidade considerável de CPU na maior parte do tempo. Meu outro servidor do CentOS 6 usando "init" em vez de systemd não tem esse uso de recursos.

No exemplo acima, durante a veiculação web normal sem outros processos em execução, systemd, systemd-logind, systemd-journal e dbus-daemon usam um total combinado de 10,7% de uma CPU quad-core, e o systemd está consumindo 19 % dos 16 GB de RAM do sistema. Isso não é um comportamento normal e, depois de pesquisar por aí, não encontrei mais ninguém com esse problema. O que poderia causar esse recurso monopolizando? Qualquer sugestão seria apreciada.

Saída da parte superior durante um período inativo (exceto para veiculação da web):

top - 08:51:31 up 16 days, 13:43,  2 users,  load average: 1.84, 1.39, 1.07
Tasks: 297 total,   2 running, 295 sleeping,   0 stopped,   0 zombie
%Cpu(s):  5.6 us,  3.6 sy,  0.0 ni, 90.6 id,  0.1 wa,  0.0 hi,  0.1 si,  0.0 st
KiB Mem : 16212992 total,  2466564 free,  4275764 used,  9470664 buff/cache
KiB Swap:  4194300 total,  4070740 free,   123560 used. 10707392 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                          
  743 dbus      20   0   27104   1856   1152 S   3.3  0.0 304:27.19 dbus-daemon                                      
    1 root      20   0 3247784 2.920g   1800 S   3.0 18.9 287:41.35 systemd                                          
  737 root      20   0   27416   2524   1304 S   2.7  0.0 225:32.66 systemd-logind                                   
  736 root      20   0  434760   3756   3076 S   2.0  0.0 172:26.53 NetworkManager                                   
  548 root      20   0   82276  34652  34516 S   1.7  0.2 160:20.16 systemd-journal                                  
  770 polkitd   20   0  522920   2956   2248 S   1.7  0.0 120:06.11 polkitd                                          
  716 root      16  -4  116744   1368   1312 S   1.3  0.0  93:26.54 auditd                                           
 3778 nginx     20   0  446488  14688   6564 S   1.3  0.1   2:18.80 php-fpm                                          
 3847 nginx     20   0  446316  14588   6548 S   1.3  0.1   2:19.29 php-fpm                                          
 7000 nginx     20   0  446132  14400   6544 S   1.3  0.1   1:22.77 php-fpm                                          
14862 nginx     20   0  446304  14600   6580 S   1.3  0.1   1:32.25 php-fpm                                          
30333 nginx     20   0  446292  14468   6528 S   1.3  0.1   1:40.78 php-fpm                                          
  740 root      20   0  784980  20112  19696 S   1.0  0.1  76:12.69 rsyslogd                                         
 3521 nginx     20   0  446188  14848   6748 S   1.0  0.1   2:20.00 php-fpm                                          
 3687 nginx     20   0  446036  14688   6764 S   1.0  0.1   2:20.45 php-fpm                                          
 3689 nginx     20   0  446408  14604   6552 S   1.0  0.1   2:19.75 php-fpm                                          
 3774 nginx     20   0  446288  14568   6552 S   1.0  0.1   2:19.68 php-fpm                                          
 3836 nginx     20   0  447416  15572   6564 S   1.0  0.1   2:21.06 php-fpm                                          
 4861 nginx     20   0  446260  14576   6540 S   1.0  0.1   2:18.94 php-fpm                                          
 4862 nginx     20   0  446508  15084   6764 S   1.0  0.1   2:20.71 php-fpm                                          
13538 nginx     20   0  447204  15452   6572 S   1.0  0.1   1:32.33 php-fpm                                          
15530 nginx     20   0  446292  14520   6528 S   1.0  0.1   1:32.55 php-fpm                                          
28468 nginx     20   0  446356  14672   6568 S   1.0  0.1   1:42.21 php-fpm                                          
29564 nginx     20   0  446292  14536   6548 S   1.0  0.1   1:41.11 php-fpm                                          
30851 nginx     20   0  445956  14568   6748 S   1.0  0.1   1:49.66 php-fpm 

Edite 2-14-16

Eu posso ter encontrado algo relevante na saída do "sudo journalctl" (veja abaixo). Há muitas linhas ocorrendo a cada segundo por horas a cada vez em relação às conexões SSH de um dos meus outros servidores de produção. Estes são processos de rsync que transferem arquivos do servidor remoto para o servidor em questão. Isso explica o uso da CPU de systemd, systemd-logind, NetworkManager e systemd-journal.

No entanto, isso não explica o vazamento de memória, que é o maior problema. Desde a escrita original deste post há alguns dias, o systemd aumentou de 18.9% para 21.4% de uso da memória do sistema.

O log abaixo foi modificado para substituir o nome de domínio real e o endereço IP dos servidores.

Feb 14 10:02:13 hostname.domain.com systemd-logind[737]: New session 6467482 of user tropicg9.
Feb 14 10:02:13 hostname.domain.com systemd[1]: Started Session 6467482 of user tropicg9.
Feb 14 10:02:13 hostname.domain.com systemd[1]: Starting Session 6467482 of user tropicg9.
Feb 14 10:02:13 hostname.domain.com sshd[9665]: pam_unix(sshd:session): session opened for user tropicg9 by (uid=0)
Feb 14 10:02:13 hostname.domain.com sshd[9667]: Received disconnect from 1.2.3.4: 11: disconnected by user
Feb 14 10:02:13 hostname.domain.com sshd[9665]: pam_unix(sshd:session): session closed for user tropicg9
Feb 14 10:02:13 hostname.domain.com systemd-logind[737]: Removed session 6467482.
Feb 14 10:02:14 hostname.domain.com sshd[9728]: Accepted publickey for tropicg9 from 1.2.3.4 port 45289 ssh2: RSA 0b:
Feb 14 10:02:14 hostname.domain.com systemd-logind[737]: New session 6467483 of user tropicg9.
Feb 14 10:02:14 hostname.domain.com systemd[1]: Started Session 6467483 of user tropicg9.
Feb 14 10:02:14 hostname.domain.com systemd[1]: Starting Session 6467483 of user tropicg9.
Feb 14 10:02:14 hostname.domain.com sshd[9728]: pam_unix(sshd:session): session opened for user tropicg9 by (uid=0)
Feb 14 10:02:14 hostname.domain.com sshd[9735]: Received disconnect from 1.2.3.4: 11: disconnected by user
Feb 14 10:02:14 hostname.domain.com sshd[9728]: pam_unix(sshd:session): session closed for user tropicg9
Feb 14 10:02:14 hostname.domain.com systemd-logind[737]: Removed session 6467483.
Feb 14 10:02:15 hostname.domain.com sshd[9876]: Accepted publickey for tropicg9 from 1.2.3.4 port 45290 ssh2: RSA 0b:
Feb 14 10:02:15 hostname.domain.com systemd-logind[737]: New session 6467484 of user tropicg9.
Feb 14 10:02:15 hostname.domain.com systemd[1]: Started Session 6467484 of user tropicg9.
Feb 14 10:02:15 hostname.domain.com systemd[1]: Starting Session 6467484 of user tropicg9.
Feb 14 10:02:15 hostname.domain.com sshd[9876]: pam_unix(sshd:session): session opened for user tropicg9 by (uid=0)
Feb 14 10:02:15 hostname.domain.com sshd[9883]: Received disconnect from 1.2.3.4: 11: disconnected by user
Feb 14 10:02:15 hostname.domain.com sshd[9876]: pam_unix(sshd:session): session closed for user tropicg9
Feb 14 10:02:15 hostname.domain.com systemd-logind[737]: Removed session 6467484.
Feb 14 10:02:20 hostname.domain.com sshd[10333]: Accepted publickey for tropicg9 from 1.2.3.4 port 45291 ssh2: RSA 0b
Feb 14 10:02:20 hostname.domain.com systemd-logind[737]: New session 6467485 of user tropicg9.
Feb 14 10:02:20 hostname.domain.com systemd[1]: Started Session 6467485 of user tropicg9.
Feb 14 10:02:20 hostname.domain.com systemd[1]: Starting Session 6467485 of user tropicg9.
Feb 14 10:02:20 hostname.domain.com sshd[10333]: pam_unix(sshd:session): session opened for user tropicg9 by (uid=0)
Feb 14 10:02:20 hostname.domain.com sshd[10342]: Received disconnect from 1.2.3.4: 11: disconnected by user
Feb 14 10:02:20 hostname.domain.com sshd[10333]: pam_unix(sshd:session): session closed for user tropicg9
Feb 14 10:02:20 hostname.domain.com systemd-logind[737]: Removed session 6467485.
Feb 14 10:02:21 hostname.domain.com sshd[10450]: Accepted publickey for tropicg9 from 1.2.3.4 port 45292 ssh2: RSA 0b
Feb 14 10:02:21 hostname.domain.com systemd-logind[737]: New session 6467486 of user tropicg9.
Feb 14 10:02:21 hostname.domain.com systemd[1]: Started Session 6467486 of user tropicg9.
Feb 14 10:02:21 hostname.domain.com systemd[1]: Starting Session 6467486 of user tropicg9.
Feb 14 10:02:21 hostname.domain.com sshd[10450]: pam_unix(sshd:session): session opened for user tropicg9 by (uid=0)
Feb 14 10:02:21 hostname.domain.com sshd[10457]: Received disconnect from 1.2.3.4: 11: disconnected by user
Feb 14 10:02:21 hostname.domain.com sshd[10450]: pam_unix(sshd:session): session closed for user tropicg9
Feb 14 10:02:21 hostname.domain.com systemd-logind[737]: Removed session 6467486.
Feb 14 10:02:22 hostname.domain.com sshd[10473]: Accepted publickey for tropicg9 from 1.2.3.4 port 45293 ssh2: RSA 0b
Feb 14 10:02:22 hostname.domain.com systemd-logind[737]: New session 6467487 of user tropicg9.
Feb 14 10:02:22 hostname.domain.com systemd[1]: Started Session 6467487 of user tropicg9.
Feb 14 10:02:22 hostname.domain.com systemd[1]: Starting Session 6467487 of user tropicg9.
Feb 14 10:02:22 hostname.domain.com sshd[10473]: pam_unix(sshd:session): session opened for user tropicg9 by (uid=0)
Feb 14 10:02:22 hostname.domain.com sshd[10475]: Received disconnect from 1.2.3.4: 11: disconnected by user
Feb 14 10:02:22 hostname.domain.com sshd[10473]: pam_unix(sshd:session): session closed for user tropicg9
Feb 14 10:02:22 hostname.domain.com systemd-logind[737]: Removed session 6467487.
Feb 14 10:02:23 hostname.domain.com sshd[10484]: Accepted publickey for tropicg9 from 1.2.3.4 port 45294 ssh2: RSA 0b
Feb 14 10:02:23 hostname.domain.com systemd-logind[737]: New session 6467488 of user tropicg9.
Feb 14 10:02:23 hostname.domain.com systemd[1]: Started Session 6467488 of user tropicg9.
Feb 14 10:02:23 hostname.domain.com systemd[1]: Starting Session 6467488 of user tropicg9.
Feb 14 10:02:23 hostname.domain.com sshd[10484]: pam_unix(sshd:session): session opened for user tropicg9 by (uid=0)
Feb 14 10:02:23 hostname.domain.com sshd[10486]: Received disconnect from 1.2.3.4: 11: disconnected by user
Feb 14 10:02:23 hostname.domain.com sshd[10484]: pam_unix(sshd:session): session closed for user tropicg9
Feb 14 10:02:23 hostname.domain.com systemd-logind[737]: Removed session 6467488.
Feb 14 10:02:39 hostname.domain.com sshd[10654]: Accepted publickey for tropicg9 from 1.2.3.4 port 45295 ssh2: RSA 0b
Feb 14 10:02:39 hostname.domain.com systemd[1]: Started Session 6467489 of user tropicg9.
Feb 14 10:02:39 hostname.domain.com systemd-logind[737]: New session 6467489 of user tropicg9.
Feb 14 10:02:39 hostname.domain.com systemd[1]: Starting Session 6467489 of user tropicg9.
Feb 14 10:02:39 hostname.domain.com sshd[10654]: pam_unix(sshd:session): session opened for user tropicg9 by (uid=0)
Feb 14 10:02:39 hostname.domain.com sshd[10656]: Received disconnect from 1.2.3.4: 11: disconnected by user
Feb 14 10:02:39 hostname.domain.com sshd[10654]: pam_unix(sshd:session): session closed for user tropicg9
Feb 14 10:02:39 hostname.domain.com systemd-logind[737]: Removed session 6467489.session 6467489.

Atualizar 2-16-16

Aqui está a saída do systemd-cgtop mostrando o uso de recursos para grupos de controle ativos (role para a direita). Isso mostra todo o uso pesado de recursos sob o caminho "raiz". Isso não parece reduzi-lo, mas talvez essa informação possa ser útil.

Existem apenas 86 arquivos de escopo e diretórios associados em / run / systemd / system /, até 6 dias de idade. Houve um problema em que esses arquivos ficaram órfãos durante as conexões SSH, resultando em milhares de entradas e alta carga de CPU, mas isso não está acontecendo aqui.

Path                                                                          Tasks   %CPU   Memory  Input/s Output/s

/                                                                               296   30.5    11.3G   657.8K   893.0K
/system.slice/NetworkManager.service                                              1      -        -        -        -
/system.slice/auditd.service                                                      1      -        -        -        -
/system.slice/crond.service                                                       1      -        -        -        -
/system.slice/dbus.service                                                        1      -        -        -        -
/system.slice/irqbalance.service                                                  1      -        -        -        -
/system.slice/lvm2-lvmetad.service                                                1      -        -        -        -
/system.slice/mariadb.service                                                     2      -        -        -        -
/system.slice/nginx.service                                                      10      -        -        -        -
/system.slice/php-fpm.service                                                   101      -        -        -        -
/system.slice/polkit.service                                                      1      -        -        -        -
/system.slice/postfix.service                                                     3      -        -        -        -
/system.slice/rsyslog.service                                                     1      -        -        -        -
/system.slice/smartd.service                                                      1      -        -        -        -
/system.slice/sshd.service                                                        2      -        -        -        -
/system.slice/system-getty.slice/[email protected]                               1      -        -        -        -
/system.slice/systemd-journald.service                                            1      -        -        -        -
/system.slice/systemd-logind.service                                              1      -        -        -        -
/system.slice/systemd-udevd.service                                               1      -        -        -        -
/system.slice/tuned.service                                                       1      -        -        -        -
/system.slice/wpa_supplicant.service                                              1      -        -        -        -
/user.slice/user-1000.slice/session-7170741.scope                                 4      -        -        -        -

Compensação temporária da memória systemd

Parece que a execução de systemctl daemon-reexec liberará toda a memória alocada para o processo do PID 1. No entanto, o vazamento continua. Uma solução para esse problema é definir um cron diário para limpar a memória, mas isso não corrige o vazamento. Eu enviei um bug para o Redhat, já que esta é a versão estável do systemd para o CentOS 7.x . Espero que o vazamento possa ser encontrado e conectado.

    
por meridionaljet 12.02.2016 / 14:58

1 resposta

2

Verifique o rastreio do processo systemd para chamadas mmap / mmunmap. Deve revelar o problema:

yum install strace
strace -ff -p 1

É uma maneira rápida e suja de diagnosticar vazamentos de memória. Strace do processo systemd deve parecer semelhante:

recvmsg(23, {msg_name(0)=NULL, msg_iov(1)=[{"WATCHDOG=1", 4096}], msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS{pid=620, uid=0, gid=0}}, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = 10
open("/proc/620/cgroup", O_RDONLY|O_CLOEXEC) = 20
fstat(20, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fcfd734e000
read(20, "10:cpuset:/\n9:perf_event:/\n8:hug"..., 1024) = 164
close(20)                               = 0
munmap(0x7fcfd734e000, 4096)            = 0

Aloca memória, faz alguma coisa, do que libera memória.
Verificando o rastreio de chamadas do sistema, você deve descobrir onde ele não pode terminar chamadas e liberar memória alocada.
Eu suponho que há um problema com pseudo-sistemas de arquivos montados indevidamente ou selinux, então systemd não pode terminar suas chamadas.

    
por 16.02.2016 / 17:23