Centos reinicia abruptamente - mesmo processo principal é morto pelo sinal TERM toda vez

0

Eu tenho um centos6.9 rodando em uma VM que ficou louca esta manhã. Começou abruptamente a reiniciar. No início, o intervalo de reinicialização era de exatamente 10 minutos, depois diminuía para 5, depois para 3 e agora às vezes varia. Abaixo estão as mensagens de / var / log / messages.

May 10 18:40:01 hwmaster01 init: tty (/dev/tty1) main process (2126) killed by TERM signal
May 10 18:40:01 hwmaster01 init: tty (/dev/tty2) main process (2128) killed by TERM signal
May 10 18:40:01 hwmaster01 init: tty (/dev/tty3) main process (2130) killed by TERM signal
May 10 18:40:01 hwmaster01 init: tty (/dev/tty4) main process (2132) killed by TERM signal
May 10 18:40:01 hwmaster01 init: tty (/dev/tty5) main process (2134) killed by TERM signal
May 10 18:40:01 hwmaster01 init: tty (/dev/tty6) main process (2136) killed by TERM signal
May 10 18:40:07 hwmaster01 ntpd[1767]: ntpd exiting on signal 15
May 10 18:40:08 hwmaster01 rpcbind: rpcbind terminating on signal. Restart with "rpcbind -w"

* depois de algum tempo

May 10 18:45:02 hwmaster01 init: tty (/dev/tty1) main process (2137) killed by TERM signal
May 10 18:45:02 hwmaster01 init: tty (/dev/tty2) main process (2139) killed by TERM signal
May 10 18:45:02 hwmaster01 init: tty (/dev/tty3) main process (2141) killed by TERM signal
May 10 18:45:02 hwmaster01 init: tty (/dev/tty4) main process (2143) killed by TERM signal
May 10 18:45:02 hwmaster01 init: tty (/dev/tty5) main process (2146) killed by TERM signal
May 10 18:45:02 hwmaster01 init: tty (/dev/tty6) main process (2148) killed by TERM signal
May 10 18:45:08 hwmaster01 ntpd[1772]: ntpd exiting on signal 15
May 10 18:45:08 hwmaster01 rpcbind: rpcbind terminating on signal. Restart with "rpcbind -w"

* depois de algum tempo

May 10 18:52:01 hwmaster01 init: tty (/dev/tty1) main process (2124) killed by TERM signal
May 10 18:52:01 hwmaster01 init: tty (/dev/tty2) main process (2126) killed by TERM signal
May 10 18:52:01 hwmaster01 init: tty (/dev/tty3) main process (2128) killed by TERM signal
May 10 18:52:01 hwmaster01 init: tty (/dev/tty4) main process (2131) killed by TERM signal
May 10 18:52:01 hwmaster01 init: tty (/dev/tty5) main process (2133) killed by TERM signal
May 10 18:52:01 hwmaster01 init: tty (/dev/tty6) main process (2135) killed by TERM signal
May 10 18:52:09 hwmaster01 ntpd[1767]: ntpd exiting on signal 15
May 10 18:52:10 hwmaster01 rpcbind: rpcbind terminating on signal. Restart with "rpcbind -w"

Não há novas ferramentas de ênfase em execução. É o nó principal em um ambiente de cluster hadoop no qual existem 4 nós na VM separada, mas no mesmo hardware. Todas as VMs parecem funcionar bem em nível de hardware, mas esse nó mestre trava e interrompe todos os serviços. Alguém familiarizado com este problema?

    
por Samhash 11.05.2018 / 10:01

1 resposta

0

Você pode anexar strace a esse processo principal. Ele vai te dizer por qual processo é morto.

    
por 19.05.2018 / 01:11