Após a depuração por 6 horas - estou desistindo disso: |
Nós temos um nginx + php-fpm + mysql na LAN com quase 100 wordpress (criado e usado por diferentes designers / desenvolvedores, todos trabalhando em configuração wordpres de teste)
Estamos usando o nginx sem problemas por muito tempo.
Hoje, de repente, o nginx começou a retornar "504 Gateway Time-out" do nada ...
Eu verifiquei o log de erros do nginx para um host virtual ...
2010/09/06 21:24:24 [error] 12909#0: *349 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 192.168.0.1, server: rahul286.rtcamp.info, request: "GET /favicon.ico HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "rahul286.rtcamp.info"
2010/09/06 21:25:11 [error] 12909#0: *349 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 192.168.0.1, server: rahul286.rtcamp.info, request: "GET /favicon.ico HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "rahul286.rtcamp.info"
2010/09/06 21:25:11 [error] 12909#0: *443 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 192.168.0.1, server: rahul286.rtcamp.info, request: "GET /info.php HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "rahul286.rtcamp.info"
2010/09/06 21:25:12 [error] 12909#0: *443 connect() failed (111: Connection refused) while connecting to upstream, client: 192.168.0.1, server: rahul286.rtcamp.info, request: "GET /favicon.ico HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "rahul286.rtcamp.info"
2010/09/06 22:08:32 [error] 12909#0: *1025 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 192.168.0.1, server: rahul286.rtcamp.info, request: "GET / HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "rahul286.rtcamp.info"
2010/09/06 22:09:33 [error] 12909#0: *1025 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 192.168.0.1, server: rahul286.rtcamp.info, request: "GET /favicon.ico HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "rahul286.rtcamp.info"
2010/09/06 22:09:40 [error] 12909#0: *1064 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 192.168.0.1, server: rahul286.rtcamp.info, request: "GET /info.php HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "rahul286.rtcamp.info"
2010/09/06 22:09:40 [error] 12909#0: *1064 connect() failed (111: Connection refused) while connecting to upstream, client: 192.168.0.1, server: rahul286.rtcamp.info, request: "GET /favicon.ico HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "rahul286.rtcamp.info"
2010/09/06 22:24:44 [error] 12909#0: *1313 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 192.168.0.1, server: rahul286.rtcamp.info, request: "GET / HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "rahul286.rtcamp.info"
2010/09/06 22:24:53 [error] 12909#0: *1313 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 192.168.0.1, server: rahul286.rtcamp.info, request: "GET /favicon.ico HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "rahul286.rtcamp.info"
Enquanto executo o php-fpm na porta 9000 via TCP, executei "netstat | grep 9000" e notei algo incomum ...
(colando saída parcial aqui para facilitar a leitura)
tcp 9 0 localhost:9000 localhost:36094 CLOSE_WAIT 14269/php5-fpm
tcp 0 0 localhost:46664 localhost:9000 FIN_WAIT2 -
tcp 1257 0 localhost:9000 localhost:36135 CLOSE_WAIT -
tcp 1257 0 localhost:9000 localhost:36125 CLOSE_WAIT -
tcp 9 0 localhost:9000 localhost:36102 CLOSE_WAIT 14268/php5-fpm
tcp 0 0 localhost:46662 localhost:9000 FIN_WAIT2 -
tcp 745 0 localhost:9000 localhost:46644 CLOSE_WAIT -
tcp 0 0 localhost:46658 localhost:9000 FIN_WAIT2 -
tcp 1265 0 localhost:9000 localhost:46607 CLOSE_WAIT -
tcp 0 0 localhost:46672 localhost:9000 ESTABLISHED 12909/nginx: worker
tcp 1257 0 localhost:9000 localhost:36119 CLOSE_WAIT -
tcp 1265 0 localhost:9000 localhost:46613 CLOSE_WAIT -
tcp 0 0 localhost:46646 localhost:9000 FIN_WAIT2 -
tcp 1257 0 localhost:9000 localhost:36137 CLOSE_WAIT -
tcp 0 0 localhost:46670 localhost:9000 ESTABLISHED 12909/nginx: worker
tcp 1265 0 localhost:9000 localhost:46619 CLOSE_WAIT -
tcp 1336 0 localhost:9000 localhost:46668 ESTABLISHED -
tcp 0 0 localhost:46648 localhost:9000 FIN_WAIT2 -
tcp 1336 0 localhost:9000 localhost:46670 ESTABLISHED -
tcp 9 0 localhost:9000 localhost:36108 CLOSE_WAIT 14274/php5-fpm
tcp 1336 0 localhost:9000 localhost:46684 ESTABLISHED -
tcp 0 0 localhost:46674 localhost:9000 ESTABLISHED 12909/nginx: worker
tcp 1336 0 localhost:9000 localhost:46666 ESTABLISHED -
tcp 1257 0 localhost:9000 localhost:46648 CLOSE_WAIT -
tcp 1336 0 localhost:9000 localhost:46678 ESTABLISHED -
tcp 0 0 localhost:46668 localhost:9000 ESTABLISHED 12909/nginx: wo
Existem muitos "CLOSE_WAIT" & "FIN_WAIT2" pares como destacado abaixo (na saída acima):
tcp 1337 0 localhost:9000 localhost:46680 CLOSE_WAIT -
tcp 0 0 localhost:46680 localhost:9000 FIN_WAIT2 -
Por favor, observe a porta 46680 acima.
Eu ativei o log de erro de consultas lentas do mysql, mas não funcionou.
A partir de agora, reiniciar o php5-fpm a cada minuto através de um cronjob (veja o comando abaixo), mantendo tudo funcionando "suavemente", mas eu odeio patchwork e quero resolver isso ...
1 * * * * service php5-fpm restart > /dev/null
Eu pesquisei extensivamente no Google - não tive ajuda.
Como mencionado, este um servidor de teste na LAN, a carga da CPU nunca é ultrapassada 0,10 e o uso da memória também é inferior a 25% (o sistema tem 2 GB de RAM e o servidor Ubuntu instalado)
Então, se você achar confuso o tempo para me ajudar, por favor, dê uma dica ao menos.
Agradecemos antecipadamente pela ajuda.
-Rahul
(nota - esta é reposting de - link )
Atualização: encontrei a resposta, que é postada abaixo.