causa desconhecida para o estado de falha do systemd - systemd-logind não pode iniciar org.freedesktop.systemd1 após a reinicialização do servidor

1

Informação básica

Red Hat Enterprise Linux Server release 7.4 (Maipo)
component: systemd
Hardware: x86_64 Linux    
[root@scvberpat01 log]# uname -r
3.10.0-693.21.1.el7.x86_64

Descrição geral

Olá a todos, eu tive um grande problema neste fim de semana, resultando em um servidor necessário reiniciar o servidor de produção da nossa empresa.

Tudo correspondia em torno do serviço systemd entrando no status FAILURE e não podendo iniciar o loginmanager org.freedesktop.login1 e org.freedesktop.systemd1 via dbus / systemd-logind novamente.  Parece que algo que eu não sou capaz de reconhecer conseguiu matar o serviço systemd-logind. Estamos executando um sistema de produção pesado com cerca de 1700 tarefas / processos em um dia de produção médio simultaneamente. Crons e incrons estão em uso.  Além disso, desde a reinicialização, o módulo org.freedesktop.systemd1 não pode ser iniciado pelo systemd, causando um atraso significativo no logintime e no tempo de alteração do usuário via su.

systemd version
[root@scvberpat01 tmp]# systemctl --version
 systemd 219
 +PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ -LZ4 -SECCOMP +BLKID +ELFUTILS +KMOD +IDN

Entradas no diário do incidente iniciando o estado de FALHA do systemd

Jul 14 05:09:40 scvberpat01 sshd[31966]: Accepted password for erp_monitoring from 172.20.0.63 port 58768 ssh2
 Jul 14 05:09:40 scvberpat01 systemd-logind: New session 1135476 of user erp_monitoring.
 Jul 14 05:09:40 scvberpat01 systemd: Started Session 1135476 of user erp_monitoring.
 Jul 14 05:09:40 scvberpat01 systemd: Starting Session 1135476 of user erp_monitoring.
 Jul 14 05:09:45 scvberpat01 sshd[32537]: Accepted password for erp_monitoring from 172.20.0.63 port 58801 ssh2
 Jul 14 05:09:45 scvberpat01 systemd-logind: New session 1135477 of user erp_monitoring.
 Jul 14 05:09:45 scvberpat01 systemd: Started Session 1135477 of user erp_monitoring.
 Jul 14 05:09:45 scvberpat01 systemd: Starting Session 1135477 of user erp_monitoring.
 Jul 14 05:10:01 scvberpat01 systemd: Started Session 1135478 of user root.
 Jul 14 05:10:01 scvberpat01 systemd: Starting Session 1135478 of user root.
 Jul 14 05:10:02 scvberpat01 systemd: Started Session 1135479 of user paseb.
 Jul 14 05:10:02 scvberpat01 systemd: Starting Session 1135479 of user paseb.
 Jul 14 05:10:02 scvberpat01 systemd: Started Session 1135480 of user root.
 Jul 14 05:10:02 scvberpat01 systemd: Starting Session 1135480 of user root.
 Jul 14 05:10:03 scvberpat01 systemd: Started Session 1135482 of user paseb.
 Jul 14 05:10:03 scvberpat01 systemd: Starting Session 1135482 of user paseb.
 Jul 14 05:10:03 scvberpat01 systemd: Started Session 1135481 of user paseb.
 Jul 14 05:10:03 scvberpat01 systemd: Starting Session 1135481 of user paseb.
 Jul 14 05:10:03 scvberpat01 systemd: Started Session 1135484 of user batchparg.
 Jul 14 05:10:03 scvberpat01 systemd: Starting Session 1135484 of user batchparg.
 Jul 14 05:10:04 scvberpat01 systemd: Started Session 1135485 of user batchparg.
 Jul 14 05:10:04 scvberpat01 systemd: Starting Session 1135485 of user batchparg.
 Jul 14 05:10:05 scvberpat01 systemd: Started Session 1135483 of user batchparg.
 Jul 14 05:10:05 scvberpat01 systemd: Starting Session 1135483 of user batchparg.
 Jul 14 05:10:06 scvberpat01 systemd: Started Session 1135486 of user batchparg.
 Jul 14 05:10:06 scvberpat01 systemd: Starting Session 1135486 of user batchparg.
 Jul 14 05:10:07 scvberpat01 systemd: Started Session 1135487 of user batchparg.
 Jul 14 05:10:07 scvberpat01 systemd: Starting Session 1135487 of user batchparg.
 Jul 14 05:10:08 scvberpat01 systemd: Started Session 1135488 of user batchparg.
 Jul 14 05:10:08 scvberpat01 systemd: Starting Session 1135488 of user batchparg.
 Jul 14 05:10:10 scvberpat01 systemd: Started Session 1135489 of user batchparg.
 Jul 14 05:10:10 scvberpat01 systemd: Starting Session 1135489 of user batchparg.
 Jul 14 05:10:11 scvberpat01 systemd: Started Session 1135490 of user batchparg.
 Jul 14 05:10:11 scvberpat01 systemd: Starting Session 1135490 of user batchparg.
 Jul 14 05:10:12 scvberpat01 systemd: Started Session 1135491 of user batchparg.
 Jul 14 05:10:12 scvberpat01 systemd: Starting Session 1135491 of user batchparg.
 Jul 14 05:10:13 scvberpat01 systemd: systemd-logind.service has no holdoff time, scheduling restart.
 Jul 14 05:10:13 scvberpat01 systemd: Starting Login Service...
 Jul 14 05:10:13 scvberpat01 systemd: Started Login Service.
 Jul 14 05:10:13 scvberpat01 systemd-logind: New seat seat0.
 Jul 14 05:10:13 scvberpat01 systemd-logind: Failed to read /run/systemd/users/11469: Argument list too long
 Jul 14 05:10:13 scvberpat01 systemd-logind: Failed to read /run/systemd/users/0: Argument list too long
 Jul 14 05:10:13 scvberpat01 systemd-logind: User enumeration failed: Argument list too long
 Jul 14 05:11:58 scvberpat01 systemd-logind: Failed to stop user slice: No buffer space available
 Jul 14 05:11:58 scvberpat01 systemd-logind: Failed to stop user slice: No buffer space available
 Jul 14 05:11:58 scvberpat01 systemd-logind: Failed to stop user slice: No buffer space available
 Jul 14 05:11:58 scvberpat01 systemd-logind: Failed to stop user slice: No buffer space available
 Jul 14 05:11:58 scvberpat01 systemd-logind: Failed to start user slice user-11469.slice, ignoring: No buffer space available ((null))
 Jul 14 05:11:59 scvberpat01 systemd-logind: Failed to start user slice user-0.slice, ignoring: No buffer space available ((null))
 Jul 14 05:13:13 scvberpat01 systemd: systemd-logind.service watchdog timeout (limit 3min)!
 Jul 14 05:13:13 scvberpat01 abrt-hook-ccpp: Process 2577 (systemd-logind) of user 0 killed by SIGABRT - dumping core
 Jul 14 05:13:16 scvberpat01 ModemManager[1041]: [sleep-monitor] inhibit failed: GDBus.Error:org.freedesktop.DBus.Error.NoReply: Message did not receive a reply (timeout by message b
 us)
 Jul 14 05:13:16 scvberpat01 systemd: systemd-logind.service: main process exited, code=dumped, status=6/ABRT
 Jul 14 05:13:16 scvberpat01 systemd: Unit systemd-logind.service entered failed state.
 Jul 14 05:13:16 scvberpat01 systemd: systemd-logind.service failed.
 Jul 14 05:13:21 scvberpat01 kernel: nr_pdflush_threads exported in /proc is scheduled for removal
 Jul 14 05:13:40 scvberpat01 dbus[1028]: [system] Activating systemd to hand-off: service name='org.freedesktop.login1' unit='dbus-org.freedesktop.login1.service'
 Jul 14 05:13:41 scvberpat01 systemd-logind: Failed to enable subscription: Connection timed out
 Jul 14 05:13:41 scvberpat01 systemd-logind: Failed to fully start up daemon: Connection timed out
 Jul 14 05:13:41 scvberpat01 dbus[1028]: [system] Failed to activate service 'org.freedesktop.systemd1': timed out
 Jul 14 05:13:41 scvberpat01 systemd: systemd-logind.service: main process exited, code=exited, status=1/FAILURE
 Jul 14 05:13:41 scvberpat01 systemd: Failed to start Login Service.
 Jul 14 05:13:41 scvberpat01 systemd: Unit systemd-logind.service entered failed state.
 Jul 14 05:13:41 scvberpat01 systemd: systemd-logind.service failed.
 Jul 14 05:14:05 scvberpat01 dbus[1028]: [system] Failed to activate service 'org.freedesktop.login1': timed out
 Jul 14 05:14:06 scvberpat01 systemd-logind: Failed to enable subscription: Connection timed out
 Jul 14 05:14:06 scvberpat01 systemd-logind: Failed to fully start up daemon: Connection timed out
 Jul 14 05:14:06 scvberpat01 dbus[1028]: [system] Failed to activate service 'org.freedesktop.systemd1': timed out
 Jul 14 05:14:06 scvberpat01 systemd: systemd-logind.service: main process exited, code=exited, status=1/FAILURE
 Jul 14 05:14:06 scvberpat01 systemd: Failed to start Login Service.
 Jul 14 05:14:06 scvberpat01 systemd: Unit systemd-logind.service entered failed state.
 Jul 14 05:14:06 scvberpat01 systemd: systemd-logind.service failed.
 Jul 14 05:14:31 scvberpat01 systemd-logind: Failed to enable subscription: Connection timed out
 Jul 14 05:14:31 scvberpat01 dbus[1028]: [system] Failed to activate service 'org.freedesktop.systemd1': timed out
 Jul 14 05:14:31 scvberpat01 systemd-logind: Failed to fully start up daemon: Connection timed out
 Jul 14 05:14:31 scvberpat01 systemd: systemd-logind.service: main process exited, code=exited, status=1/FAILURE
 ...
 ...

Isso durou 1 dia e meio até impactar strongmente o servidor. No final ssh, o samba e o terminal não responderam mais. Infelizmente, não foi reconhecido de antemão, então foi feita uma reinicialização a frio.

Após a reinicialização, parece que o systemd não pode iniciar o org.freedesktop.systemd1:

Jul 18 13:21:05 scvberpat01.pankl.local systemd-logind[1023]: Failed to start user slice user-11511.slice, ignoring: Activation of org.freedesktop.systemd1 timed out (org.freedesktop.DBus.Error.TimedOut)
 Jul 18 13:21:30 scvberpat01.pankl.local systemd-logind[1023]: Failed to start session scope session-33081.scope: Connection timed out
 Jul 18 13:22:05 scvberpat01.pankl.local systemd-logind[1023]: Failed to start user slice user-11511.slice, ignoring: Activation of org.freedesktop.systemd1 timed out (org.freedesktop.DBus.Error.TimedOut)
 Jul 18 13:22:30 scvberpat01.pankl.local systemd-logind[1023]: Failed to start session scope session-33086.scope: Activation of org.freedesktop.systemd1 timed out
 Jul 18 13:23:05 scvberpat01.pankl.local systemd-logind[1023]: Failed to start user slice user-11511.slice, ignoring: Activation of org.freedesktop.systemd1 timed out (org.freedesktop.DBus.Error.TimedOut)
 Jul 18 13:23:30 scvberpat01.pankl.local systemd-logind[1023]: Failed to start session scope session-33092.scope: Activation of org.freedesktop.systemd1 timed out
 Jul 18 13:24:05 scvberpat01.pankl.local systemd-logind[1023]: Failed to start user slice user-11511.slice, ignoring: Connection timed out ((null))
 Jul 18 13:24:30 scvberpat01.pankl.local systemd-logind[1023]: Failed to start session scope session-33097.scope: Activation of org.freedesktop.systemd1 timed out
 Jul 18 13:25:05 scvberpat01.pankl.local systemd-logind[1023]: Failed to start user slice user-11511.slice, ignoring: Connection timed out ((null))
 Jul 18 13:25:30 scvberpat01.pankl.local systemd-logind[1023]: Failed to start session scope session-33102.scope: Connection timed out

"busctl --list" está mostrando que org.freedesktop.systemd1 não é ativo

[root@scvberpat01 log]# busctl --list
 NAME PID PROCESS USER CONNECTION UNIT SESSION DESCRIPTION
 :1.0 1023 systemd-logind root :1.0 systemd-logind.service - -
 :1.10 1076 NetworkManager root :1.10 NetworkManager.service - -
 :1.2 1019 avahi-daemon avahi :1.2 avahi-daemon.service - -
 :1.24 1348 cupsd root :1.24 cups.service - -
 :1.25 1342 tuned root :1.25 tuned.service - -
 :1.26 1583 colord colord :1.26 colord.service - -
 :1.27 1348 cupsd root :1.27 cups.service - -
 :1.28 2268 libvirtd root :1.28 libvirtd.service - -
 :1.3 1030 rtkit-daemon root :1.3 rtkit-daemon.service - -
 :1.34123 15157 abrt-dbus root :1.34123 dbus.service - -
 :1.34132 9585 busctl root :1.34132 sshd.service - -
 :1.384 4756 packagekitd root :1.384 packagekit.service - -
 :1.4 1029 ModemManager root :1.4 ModemManager.service - -
 :1.5 1020 polkitd polkitd :1.5 polkit.service - -
 :1.6 1088 accounts-daemon root :1.6 accounts-daemon.service - -
 :1.7 1076 NetworkManager root :1.7 NetworkManager.service - -
 com.redhat.RHSM1 - - - (activatable) - -
 com.redhat.RHSM1.Facts - - - (activatable) - -
 com.redhat.SubscriptionManager - - - (activatable) - -
 com.redhat.ifcfgrh1 1076 NetworkManager root :1.10 NetworkManager.service     - -
 com.redhat.problems.configuration - - - (activatable) - -
 com.redhat.tuned 1342 tuned root :1.25 tuned.service - -
 fi.epitest.hostap.WPASupplicant - - - (activatable) - -
 fi.w1.wpa_supplicant1 - - - (activatable) - -
 net.reactivated.Fprint - - - (activatable) - -
 org.bluez - - - (activatable) - -
 org.fedoraproject.SetroubleshootFixit - - - (activatable) - -
 org.fedoraproject.Setroubleshootd - - - (activatable) - -
 org.freedesktop.Accounts 1088 accounts-daemon root :1.6 accounts-    daemon.service - -
 org.freedesktop.Avahi 1019 avahi-daemon avahi :1.2 avahi-daemon.service - -
 org.freedesktop.ColorManager 1583 colord colord :1.26 colord.service - -
 org.freedesktop.DBus - - - - - - -
 org.freedesktop.Flatpak.SystemHelper - - - (activatable) - -
 org.freedesktop.GeoClue2 - - - (activatable) - -
 org.freedesktop.ModemManager1 1029 ModemManager root :1.4     ModemManager.service - -
 org.freedesktop.NetworkManager 1076 NetworkManager root :1.7     NetworkManager.service - -
 org.freedesktop.PackageKit 4756 packagekitd root :1.384 packagekit.service - -
 org.freedesktop.PolicyKit1 1020 polkitd polkitd :1.5 polkit.service - -
 org.freedesktop.RealtimeKit1 1030 rtkit-daemon root :1.3 rtkit-daemon.service - -
 org.freedesktop.UDisks2 - - - (activatable) - -
 org.freedesktop.UPower - - - (activatable) - -
 org.freedesktop.hostname1 - - - (activatable) - -
 org.freedesktop.import1 - - - (activatable) - -
 org.freedesktop.locale1 - - - (activatable) - -
 org.freedesktop.login1 1023 systemd-logind root :1.0 systemd-    logind.service - -
 org.freedesktop.machine1 - - - (activatable) - -
 org.freedesktop.nm_dispatcher - - - (activatable) - -
 org.freedesktop.problems 15157 abrt-dbus root :1.34123 dbus.service - -
 org.freedesktop.realmd - - - (activatable) - -
 org.freedesktop.systemd1 - - - (activatable) - -
 org.freedesktop.timedate1 - - - (activatable) - -
 org.gnome.GConf.Defaults - - - (activatable) - -
 org.opensuse.CupsPkHelper.Mechanism - - - (activatable) - -

rastrear um usuário Mudança Mostra exatamente o tempo limite de 30 segundos da linha 18:34:56. Isso realmente não me diz muito, mas talvez alguém possa me ajudar aqui. Comparando isso com o nosso sistema de testes que não enfrentou este incidente Major não está mostrando nenhum timeout aqui.

18:34:56 close(3) = 0
 18:34:56 munmap(0x7fee1868b000, 4096) = 0
 18:34:56 socket(AF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 3
 18:34:56 setsockopt(3, SOL_SOCKET, SO_PASSCRED, [0], 4) = 0
 18:34:56 setsockopt(3, SOL_SOCKET, SO_PASSSEC, [0], 4) = 0
 18:34:56 getsockopt(3, SOL_SOCKET, SO_RCVBUF, [212992], [4]) = 0
 18:34:56 setsockopt(3, SOL_SOCKET, SO_RCVBUFFORCE, [8388608], 4) = 0
 18:34:56 getsockopt(3, SOL_SOCKET, SO_SNDBUF, [212992], [4]) = 0
 18:34:56 setsockopt(3, SOL_SOCKET, SO_SNDBUFFORCE, [8388608], 4) = 0
 18:34:56 connect(3, {sa_family=AF_LOCAL, sun_path="/var/run/dbus/system_bus_socket"}, 33) = 0
 18:34:56 getsockopt(3, SOL_SOCKET, SO_PEERCRED, {pid=1, uid=0, gid=0}, [12]) = 0
 18:34:56 getsockopt(3, SOL_SOCKET, SO_PEERSEC, 0x5645d834d4f0, 0x7ffdc6169a50) = -1 ENOPROTOOPT (Protocol not available)
 18:34:56 fstat(3, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
 18:34:56 getsockopt(3, SOL_SOCKET, SO_ACCEPTCONN, [0], [4]) = 0
 18:34:56 getsockname(3, {sa_family=AF_LOCAL, NULL}, [2]) = 0
 18:34:56 geteuid() = 0
 18:34:56 sendmsg(3, {msg_name(0)=NULL, msg_iov(3)=[{"
Red Hat Enterprise Linux Server release 7.4 (Maipo)
component: systemd
Hardware: x86_64 Linux    
[root@scvberpat01 log]# uname -r
3.10.0-693.21.1.el7.x86_64
AUTH EXTERNAL ", 15}, {"30", 2}, {"\r\nNEGOTIATE_UNIX_FD\r\nBEGIN\r\n", 28}], msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 45 18:34:56 gettid() = 9342 18:34:56 getrandom("56f77UZ27T2047}", 16, GRND_NONBLOCK) = 16 18:34:56 recvmsg(3, {msg_name(0)=NULL, msg_iov(1)=[{"OK 0345052a3855cdc590c849e45b4b4"..., 256}], msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = 52 18:34:56 sendmsg(3, {msg_name(0)=NULL, msg_iov(1)=[{"l
systemd version
[root@scvberpat01 tmp]# systemctl --version
 systemd 219
 +PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ -LZ4 -SECCOMP +BLKID +ELFUTILS +KMOD +IDN
Jul 14 05:09:40 scvberpat01 sshd[31966]: Accepted password for erp_monitoring from 172.20.0.63 port 58768 ssh2
 Jul 14 05:09:40 scvberpat01 systemd-logind: New session 1135476 of user erp_monitoring.
 Jul 14 05:09:40 scvberpat01 systemd: Started Session 1135476 of user erp_monitoring.
 Jul 14 05:09:40 scvberpat01 systemd: Starting Session 1135476 of user erp_monitoring.
 Jul 14 05:09:45 scvberpat01 sshd[32537]: Accepted password for erp_monitoring from 172.20.0.63 port 58801 ssh2
 Jul 14 05:09:45 scvberpat01 systemd-logind: New session 1135477 of user erp_monitoring.
 Jul 14 05:09:45 scvberpat01 systemd: Started Session 1135477 of user erp_monitoring.
 Jul 14 05:09:45 scvberpat01 systemd: Starting Session 1135477 of user erp_monitoring.
 Jul 14 05:10:01 scvberpat01 systemd: Started Session 1135478 of user root.
 Jul 14 05:10:01 scvberpat01 systemd: Starting Session 1135478 of user root.
 Jul 14 05:10:02 scvberpat01 systemd: Started Session 1135479 of user paseb.
 Jul 14 05:10:02 scvberpat01 systemd: Starting Session 1135479 of user paseb.
 Jul 14 05:10:02 scvberpat01 systemd: Started Session 1135480 of user root.
 Jul 14 05:10:02 scvberpat01 systemd: Starting Session 1135480 of user root.
 Jul 14 05:10:03 scvberpat01 systemd: Started Session 1135482 of user paseb.
 Jul 14 05:10:03 scvberpat01 systemd: Starting Session 1135482 of user paseb.
 Jul 14 05:10:03 scvberpat01 systemd: Started Session 1135481 of user paseb.
 Jul 14 05:10:03 scvberpat01 systemd: Starting Session 1135481 of user paseb.
 Jul 14 05:10:03 scvberpat01 systemd: Started Session 1135484 of user batchparg.
 Jul 14 05:10:03 scvberpat01 systemd: Starting Session 1135484 of user batchparg.
 Jul 14 05:10:04 scvberpat01 systemd: Started Session 1135485 of user batchparg.
 Jul 14 05:10:04 scvberpat01 systemd: Starting Session 1135485 of user batchparg.
 Jul 14 05:10:05 scvberpat01 systemd: Started Session 1135483 of user batchparg.
 Jul 14 05:10:05 scvberpat01 systemd: Starting Session 1135483 of user batchparg.
 Jul 14 05:10:06 scvberpat01 systemd: Started Session 1135486 of user batchparg.
 Jul 14 05:10:06 scvberpat01 systemd: Starting Session 1135486 of user batchparg.
 Jul 14 05:10:07 scvberpat01 systemd: Started Session 1135487 of user batchparg.
 Jul 14 05:10:07 scvberpat01 systemd: Starting Session 1135487 of user batchparg.
 Jul 14 05:10:08 scvberpat01 systemd: Started Session 1135488 of user batchparg.
 Jul 14 05:10:08 scvberpat01 systemd: Starting Session 1135488 of user batchparg.
 Jul 14 05:10:10 scvberpat01 systemd: Started Session 1135489 of user batchparg.
 Jul 14 05:10:10 scvberpat01 systemd: Starting Session 1135489 of user batchparg.
 Jul 14 05:10:11 scvberpat01 systemd: Started Session 1135490 of user batchparg.
 Jul 14 05:10:11 scvberpat01 systemd: Starting Session 1135490 of user batchparg.
 Jul 14 05:10:12 scvberpat01 systemd: Started Session 1135491 of user batchparg.
 Jul 14 05:10:12 scvberpat01 systemd: Starting Session 1135491 of user batchparg.
 Jul 14 05:10:13 scvberpat01 systemd: systemd-logind.service has no holdoff time, scheduling restart.
 Jul 14 05:10:13 scvberpat01 systemd: Starting Login Service...
 Jul 14 05:10:13 scvberpat01 systemd: Started Login Service.
 Jul 14 05:10:13 scvberpat01 systemd-logind: New seat seat0.
 Jul 14 05:10:13 scvberpat01 systemd-logind: Failed to read /run/systemd/users/11469: Argument list too long
 Jul 14 05:10:13 scvberpat01 systemd-logind: Failed to read /run/systemd/users/0: Argument list too long
 Jul 14 05:10:13 scvberpat01 systemd-logind: User enumeration failed: Argument list too long
 Jul 14 05:11:58 scvberpat01 systemd-logind: Failed to stop user slice: No buffer space available
 Jul 14 05:11:58 scvberpat01 systemd-logind: Failed to stop user slice: No buffer space available
 Jul 14 05:11:58 scvberpat01 systemd-logind: Failed to stop user slice: No buffer space available
 Jul 14 05:11:58 scvberpat01 systemd-logind: Failed to stop user slice: No buffer space available
 Jul 14 05:11:58 scvberpat01 systemd-logind: Failed to start user slice user-11469.slice, ignoring: No buffer space available ((null))
 Jul 14 05:11:59 scvberpat01 systemd-logind: Failed to start user slice user-0.slice, ignoring: No buffer space available ((null))
 Jul 14 05:13:13 scvberpat01 systemd: systemd-logind.service watchdog timeout (limit 3min)!
 Jul 14 05:13:13 scvberpat01 abrt-hook-ccpp: Process 2577 (systemd-logind) of user 0 killed by SIGABRT - dumping core
 Jul 14 05:13:16 scvberpat01 ModemManager[1041]: [sleep-monitor] inhibit failed: GDBus.Error:org.freedesktop.DBus.Error.NoReply: Message did not receive a reply (timeout by message b
 us)
 Jul 14 05:13:16 scvberpat01 systemd: systemd-logind.service: main process exited, code=dumped, status=6/ABRT
 Jul 14 05:13:16 scvberpat01 systemd: Unit systemd-logind.service entered failed state.
 Jul 14 05:13:16 scvberpat01 systemd: systemd-logind.service failed.
 Jul 14 05:13:21 scvberpat01 kernel: nr_pdflush_threads exported in /proc is scheduled for removal
 Jul 14 05:13:40 scvberpat01 dbus[1028]: [system] Activating systemd to hand-off: service name='org.freedesktop.login1' unit='dbus-org.freedesktop.login1.service'
 Jul 14 05:13:41 scvberpat01 systemd-logind: Failed to enable subscription: Connection timed out
 Jul 14 05:13:41 scvberpat01 systemd-logind: Failed to fully start up daemon: Connection timed out
 Jul 14 05:13:41 scvberpat01 dbus[1028]: [system] Failed to activate service 'org.freedesktop.systemd1': timed out
 Jul 14 05:13:41 scvberpat01 systemd: systemd-logind.service: main process exited, code=exited, status=1/FAILURE
 Jul 14 05:13:41 scvberpat01 systemd: Failed to start Login Service.
 Jul 14 05:13:41 scvberpat01 systemd: Unit systemd-logind.service entered failed state.
 Jul 14 05:13:41 scvberpat01 systemd: systemd-logind.service failed.
 Jul 14 05:14:05 scvberpat01 dbus[1028]: [system] Failed to activate service 'org.freedesktop.login1': timed out
 Jul 14 05:14:06 scvberpat01 systemd-logind: Failed to enable subscription: Connection timed out
 Jul 14 05:14:06 scvberpat01 systemd-logind: Failed to fully start up daemon: Connection timed out
 Jul 14 05:14:06 scvberpat01 dbus[1028]: [system] Failed to activate service 'org.freedesktop.systemd1': timed out
 Jul 14 05:14:06 scvberpat01 systemd: systemd-logind.service: main process exited, code=exited, status=1/FAILURE
 Jul 14 05:14:06 scvberpat01 systemd: Failed to start Login Service.
 Jul 14 05:14:06 scvberpat01 systemd: Unit systemd-logind.service entered failed state.
 Jul 14 05:14:06 scvberpat01 systemd: systemd-logind.service failed.
 Jul 14 05:14:31 scvberpat01 systemd-logind: Failed to enable subscription: Connection timed out
 Jul 14 05:14:31 scvberpat01 dbus[1028]: [system] Failed to activate service 'org.freedesktop.systemd1': timed out
 Jul 14 05:14:31 scvberpat01 systemd-logind: Failed to fully start up daemon: Connection timed out
 Jul 14 05:14:31 scvberpat01 systemd: systemd-logind.service: main process exited, code=exited, status=1/FAILURE
 ...
 ...
Jul 18 13:21:05 scvberpat01.pankl.local systemd-logind[1023]: Failed to start user slice user-11511.slice, ignoring: Activation of org.freedesktop.systemd1 timed out (org.freedesktop.DBus.Error.TimedOut)
 Jul 18 13:21:30 scvberpat01.pankl.local systemd-logind[1023]: Failed to start session scope session-33081.scope: Connection timed out
 Jul 18 13:22:05 scvberpat01.pankl.local systemd-logind[1023]: Failed to start user slice user-11511.slice, ignoring: Activation of org.freedesktop.systemd1 timed out (org.freedesktop.DBus.Error.TimedOut)
 Jul 18 13:22:30 scvberpat01.pankl.local systemd-logind[1023]: Failed to start session scope session-33086.scope: Activation of org.freedesktop.systemd1 timed out
 Jul 18 13:23:05 scvberpat01.pankl.local systemd-logind[1023]: Failed to start user slice user-11511.slice, ignoring: Activation of org.freedesktop.systemd1 timed out (org.freedesktop.DBus.Error.TimedOut)
 Jul 18 13:23:30 scvberpat01.pankl.local systemd-logind[1023]: Failed to start session scope session-33092.scope: Activation of org.freedesktop.systemd1 timed out
 Jul 18 13:24:05 scvberpat01.pankl.local systemd-logind[1023]: Failed to start user slice user-11511.slice, ignoring: Connection timed out ((null))
 Jul 18 13:24:30 scvberpat01.pankl.local systemd-logind[1023]: Failed to start session scope session-33097.scope: Activation of org.freedesktop.systemd1 timed out
 Jul 18 13:25:05 scvberpat01.pankl.local systemd-logind[1023]: Failed to start user slice user-11511.slice, ignoring: Connection timed out ((null))
 Jul 18 13:25:30 scvberpat01.pankl.local systemd-logind[1023]: Failed to start session scope session-33102.scope: Connection timed out
[root@scvberpat01 log]# busctl --list
 NAME PID PROCESS USER CONNECTION UNIT SESSION DESCRIPTION
 :1.0 1023 systemd-logind root :1.0 systemd-logind.service - -
 :1.10 1076 NetworkManager root :1.10 NetworkManager.service - -
 :1.2 1019 avahi-daemon avahi :1.2 avahi-daemon.service - -
 :1.24 1348 cupsd root :1.24 cups.service - -
 :1.25 1342 tuned root :1.25 tuned.service - -
 :1.26 1583 colord colord :1.26 colord.service - -
 :1.27 1348 cupsd root :1.27 cups.service - -
 :1.28 2268 libvirtd root :1.28 libvirtd.service - -
 :1.3 1030 rtkit-daemon root :1.3 rtkit-daemon.service - -
 :1.34123 15157 abrt-dbus root :1.34123 dbus.service - -
 :1.34132 9585 busctl root :1.34132 sshd.service - -
 :1.384 4756 packagekitd root :1.384 packagekit.service - -
 :1.4 1029 ModemManager root :1.4 ModemManager.service - -
 :1.5 1020 polkitd polkitd :1.5 polkit.service - -
 :1.6 1088 accounts-daemon root :1.6 accounts-daemon.service - -
 :1.7 1076 NetworkManager root :1.7 NetworkManager.service - -
 com.redhat.RHSM1 - - - (activatable) - -
 com.redhat.RHSM1.Facts - - - (activatable) - -
 com.redhat.SubscriptionManager - - - (activatable) - -
 com.redhat.ifcfgrh1 1076 NetworkManager root :1.10 NetworkManager.service     - -
 com.redhat.problems.configuration - - - (activatable) - -
 com.redhat.tuned 1342 tuned root :1.25 tuned.service - -
 fi.epitest.hostap.WPASupplicant - - - (activatable) - -
 fi.w1.wpa_supplicant1 - - - (activatable) - -
 net.reactivated.Fprint - - - (activatable) - -
 org.bluez - - - (activatable) - -
 org.fedoraproject.SetroubleshootFixit - - - (activatable) - -
 org.fedoraproject.Setroubleshootd - - - (activatable) - -
 org.freedesktop.Accounts 1088 accounts-daemon root :1.6 accounts-    daemon.service - -
 org.freedesktop.Avahi 1019 avahi-daemon avahi :1.2 avahi-daemon.service - -
 org.freedesktop.ColorManager 1583 colord colord :1.26 colord.service - -
 org.freedesktop.DBus - - - - - - -
 org.freedesktop.Flatpak.SystemHelper - - - (activatable) - -
 org.freedesktop.GeoClue2 - - - (activatable) - -
 org.freedesktop.ModemManager1 1029 ModemManager root :1.4     ModemManager.service - -
 org.freedesktop.NetworkManager 1076 NetworkManager root :1.7     NetworkManager.service - -
 org.freedesktop.PackageKit 4756 packagekitd root :1.384 packagekit.service - -
 org.freedesktop.PolicyKit1 1020 polkitd polkitd :1.5 polkit.service - -
 org.freedesktop.RealtimeKit1 1030 rtkit-daemon root :1.3 rtkit-daemon.service - -
 org.freedesktop.UDisks2 - - - (activatable) - -
 org.freedesktop.UPower - - - (activatable) - -
 org.freedesktop.hostname1 - - - (activatable) - -
 org.freedesktop.import1 - - - (activatable) - -
 org.freedesktop.locale1 - - - (activatable) - -
 org.freedesktop.login1 1023 systemd-logind root :1.0 systemd-    logind.service - -
 org.freedesktop.machine1 - - - (activatable) - -
 org.freedesktop.nm_dispatcher - - - (activatable) - -
 org.freedesktop.problems 15157 abrt-dbus root :1.34123 dbus.service - -
 org.freedesktop.realmd - - - (activatable) - -
 org.freedesktop.systemd1 - - - (activatable) - -
 org.freedesktop.timedate1 - - - (activatable) - -
 org.gnome.GConf.Defaults - - - (activatable) - -
 org.opensuse.CupsPkHelper.Mechanism - - - (activatable) - -
18:34:56 close(3) = 0
 18:34:56 munmap(0x7fee1868b000, 4096) = 0
 18:34:56 socket(AF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 3
 18:34:56 setsockopt(3, SOL_SOCKET, SO_PASSCRED, [0], 4) = 0
 18:34:56 setsockopt(3, SOL_SOCKET, SO_PASSSEC, [0], 4) = 0
 18:34:56 getsockopt(3, SOL_SOCKET, SO_RCVBUF, [212992], [4]) = 0
 18:34:56 setsockopt(3, SOL_SOCKET, SO_RCVBUFFORCE, [8388608], 4) = 0
 18:34:56 getsockopt(3, SOL_SOCKET, SO_SNDBUF, [212992], [4]) = 0
 18:34:56 setsockopt(3, SOL_SOCKET, SO_SNDBUFFORCE, [8388608], 4) = 0
 18:34:56 connect(3, {sa_family=AF_LOCAL, sun_path="/var/run/dbus/system_bus_socket"}, 33) = 0
 18:34:56 getsockopt(3, SOL_SOCKET, SO_PEERCRED, {pid=1, uid=0, gid=0}, [12]) = 0
 18:34:56 getsockopt(3, SOL_SOCKET, SO_PEERSEC, 0x5645d834d4f0, 0x7ffdc6169a50) = -1 ENOPROTOOPT (Protocol not available)
 18:34:56 fstat(3, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
 18:34:56 getsockopt(3, SOL_SOCKET, SO_ACCEPTCONN, [0], [4]) = 0
 18:34:56 getsockname(3, {sa_family=AF_LOCAL, NULL}, [2]) = 0
 18:34:56 geteuid() = 0
 18:34:56 sendmsg(3, {msg_name(0)=NULL, msg_iov(3)=[{"%pre%AUTH EXTERNAL ", 15}, {"30", 2}, {"\r\nNEGOTIATE_UNIX_FD\r\nBEGIN\r\n", 28}], msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 45
 18:34:56 gettid() = 9342
 18:34:56 getrandom("56f77UZ27T2047}", 16, GRND_NONBLOCK) = 16
 18:34:56 recvmsg(3, {msg_name(0)=NULL, msg_iov(1)=[{"OK 0345052a3855cdc590c849e45b4b4"..., 256}], msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = 52
 18:34:56 sendmsg(3, {msg_name(0)=NULL, msg_iov(1)=[{"l%pre%%pre%%pre%%pre%%pre%%pre%%pre%%pre%m%pre%%pre%%pre%o%pre%%pre%%pre%%pre%/org/fre"..., 128}], msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 128
 18:34:56 recvmsg(3, 0x7ffdc6168960, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = -1 EAGAIN (Resource temporarily unavailable)
 18:34:56 ppoll([{fd=3, events=POLLIN}], 1, {24, 999810000}, NULL, 8) = 1 ([{fd=3, revents=POLLIN}], left {24, 999633311})
 18:34:56 recvmsg(3, {msg_name(0)=NULL, msg_iov(1)=[{"l\r%pre%%pre%%pre%%pre%%pre%%pre%E%pre%%pre%%pre%s%pre%%pre%%pre%%pre%", 24}], msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = 24
 18:34:56 recvmsg(3, {msg_name(0)=NULL, msg_iov(1)=[{":1.25140%pre%%pre%%pre%%pre%%pre%%pre%%pre%%pre%u%pre%%pre%%pre%%pre%g%pre%s%pre%%pre%"..., 77}], msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = 77
 18:34:56 sendmsg(3, {msg_name(0)=NULL, msg_iov(2)=[{"l%pre%p%pre%%pre%%pre%%pre%%pre%%pre%0%pre%%pre%%pre%o%pre%%pre%%pre%%pre%/org/fre"..., 168}, {"2,%pre%%pre%~$%pre%%pre%%pre%%pre%%pre%su-l%pre%%pre%%pre%%pre%%pre%%pre%%pre%tty%pre%%pre%%pre%%pre%"..., 112}], msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 280
 18:34:56 recvmsg(3, {msg_name(0)=NULL, msg_iov(1)=[{"l\r%pre%%pre%%pre%%pre%%pre%%pre%5%pre%%pre%%pre%o%pre%%pre%%pre%%pre%", 24}], msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = 24
 18:34:56 recvmsg(3, {msg_name(0)=NULL, msg_iov(1)=[{"/org/freedesktop/DBus%pre%%pre%%pre%s%pre%%pre%%pre%%pre%"..., 157}], msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = 157
 18:34:56 recvmsg(3, 0x7ffdc6168b10, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = -1 EAGAIN (Resource temporarily unavailable)
 18:34:56 ppoll([{fd=3, events=POLLIN}], 1, {24, 999829000}, NULL, 8) = 0 (Timeout)
 18:35:21 open("/usr/share/locale/locale.alias", O_RDONLY|O_CLOEXEC) = 4
 18:35:21 fstat(4, {st_mode=S_IFREG|0644, st_size=2502, ...}) = 0
 18:35:21 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fee1868b000
 18:35:21 read(4, "# Locale name alias data base.\n#"..., 4096) = 2502
 18:35:21 read(4, "", 4096) = 0
 18:35:21 close(4) = 0
 18:35:21 munmap(0x7fee1868b000, 4096) = 0
%pre%%pre%%pre%m%pre%%pre%%pre%o%pre%%pre%%pre%%pre%/org/fre"..., 128}], msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 128 18:34:56 recvmsg(3, 0x7ffdc6168960, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = -1 EAGAIN (Resource temporarily unavailable) 18:34:56 ppoll([{fd=3, events=POLLIN}], 1, {24, 999810000}, NULL, 8) = 1 ([{fd=3, revents=POLLIN}], left {24, 999633311}) 18:34:56 recvmsg(3, {msg_name(0)=NULL, msg_iov(1)=[{"l\r%pre%%pre%%pre%%pre%%pre%%pre%E%pre%%pre%%pre%s%pre%%pre%%pre%%pre%", 24}], msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = 24 18:34:56 recvmsg(3, {msg_name(0)=NULL, msg_iov(1)=[{":1.25140%pre%%pre%%pre%%pre%%pre%%pre%%pre%%pre%u%pre%%pre%%pre%%pre%g%pre%s%pre%%pre%"..., 77}], msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = 77 18:34:56 sendmsg(3, {msg_name(0)=NULL, msg_iov(2)=[{"l%pre%p%pre%%pre%%pre%%pre%%pre%%pre%0%pre%%pre%%pre%o%pre%%pre%%pre%%pre%/org/fre"..., 168}, {"2,%pre%%pre%~$%pre%%pre%%pre%%pre%%pre%su-l%pre%%pre%%pre%%pre%%pre%%pre%%pre%tty%pre%%pre%%pre%%pre%"..., 112}], msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 280 18:34:56 recvmsg(3, {msg_name(0)=NULL, msg_iov(1)=[{"l\r%pre%%pre%%pre%%pre%%pre%%pre%5%pre%%pre%%pre%o%pre%%pre%%pre%%pre%", 24}], msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = 24 18:34:56 recvmsg(3, {msg_name(0)=NULL, msg_iov(1)=[{"/org/freedesktop/DBus%pre%%pre%%pre%s%pre%%pre%%pre%%pre%"..., 157}], msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = 157 18:34:56 recvmsg(3, 0x7ffdc6168b10, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = -1 EAGAIN (Resource temporarily unavailable) 18:34:56 ppoll([{fd=3, events=POLLIN}], 1, {24, 999829000}, NULL, 8) = 0 (Timeout) 18:35:21 open("/usr/share/locale/locale.alias", O_RDONLY|O_CLOEXEC) = 4 18:35:21 fstat(4, {st_mode=S_IFREG|0644, st_size=2502, ...}) = 0 18:35:21 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fee1868b000 18:35:21 read(4, "# Locale name alias data base.\n#"..., 4096) = 2502 18:35:21 read(4, "", 4096) = 0 18:35:21 close(4) = 0 18:35:21 munmap(0x7fee1868b000, 4096) = 0

#

Eu precisaria de ajuda na possível causa raiz que fez com que o systemd entrasse no estado FAILURE, bem como uma solução para o org.freedesktop.systemd1 não iniciar. O que, na minha opinião, está causando um atraso de alteração de login / usuário de 30 segundos.  Por favor, considere o fato de que estamos falando de um sistema de produção 24/7.

Muito obrigado antecipadamente  Mario

    
por Mario.P 18.07.2018 / 14:25

1 resposta

1

[RESOLVIDO] - A versão 42.17e do systemd não pode lidar com muitas sessões de uma só vez, de acordo com o suporte global da Red Hat. Ele ficou sem memorey e, além disso, travou os módulos org.freedesktop.logind e org.freedesktop.systemd.

Este problema foi corrigido e uma atualização para o systemd 52.17e é recomendada.

  • Esse problema pode acontecer por causa da sessão de usuário "abandonada". Você pode verificar com o comando - $ systemctl | grep 'do usuário' | grep 'abandoned'

  • Você pode excluir diretórios de sessão $ rm -rf / run / systemd / system / session escopo

Quaisquer problemas de tempo de execução podem ser corrigidos pelos comandos abaixo, isso não prejudicará nada no sistema -

$ systemctl daemon-reexec

Se o daemon-reexec falhar com o tempo limite, você deve tentar

$ kill 1

Atenciosamente Mario.P

    
por 30.07.2018 / 09:09