Olá pessoal, espero que você possa me ajudar aqui.
Eu tenho um Ngingx parsing http e https para um cache de verniz (3.0.2). Do verniz é enviado para o apache2. Agora, há algum tempo, acompanho alguns erros estranhos do 503. Mas eu não consigo encontrar a bala de prata.
Atualmente, estou registrando os erros do 503 por meio do verniz dessa maneira:
sudo varnishlog -c -m TxStatus:503 >> /home/rj/varnishlog503.log
e, em seguida, referindo-se ao log de acesso do apache para ver se alguma solicitação 503 foi tratada.
Hoje tive uma verificação de integridade do firewall que falhou:
20 SessionOpen c 127.0.0.1 34319 :8081
20 ReqStart c 127.0.0.1 34319 607335635
20 RxRequest c HEAD
20 RxURL c /health-check
20 RxProtocol c HTTP/1.0
20 RxHeader c X-Real-IP: 192.168.3.254
20 RxHeader c Host: 192.168.3.189
20 RxHeader c X-Forwarded-For: 192.168.3.254
20 RxHeader c Connection: close
20 RxHeader c User-Agent: Astaro Service Monitor 0.9
20 RxHeader c Accept: */*
20 VCL_call c recv lookup
20 VCL_call c hash
20 Hash c /health-check
20 VCL_return c hash
20 VCL_call c miss fetch
20 Backend c 33 aurum aurum
20 FetchError c http first read error: -1 11 (No error recorded)
20 VCL_call c error deliver
20 VCL_call c deliver deliver
20 TxProtocol c HTTP/1.1
20 TxStatus c 503
20 TxResponse c Service Unavailable
20 TxHeader c Server: Varnish
20 TxHeader c Content-Type: text/html; charset=utf-8
20 TxHeader c Retry-After: 5
20 TxHeader c Content-Length: 879
20 TxHeader c Accept-Ranges: bytes
20 TxHeader c Date: Wed, 06 Jun 2012 12:35:12 GMT
20 TxHeader c X-Varnish: 607335635
20 TxHeader c Age: 60
20 TxHeader c Via: 1.1 varnish
20 TxHeader c Connection: close
20 Length c 879
20 ReqEnd c 607335635 1338986052.649786949 1338986112.648169994 0.000160217 59.997980356 0.000402689
Agora, o servidor de back-end (apache) não possui nenhum erro 503 no log de acesso neste momento. Então estou confuso. Este verniz está jogando um 503 porque acha que o apache é lento? Há muito tráfego chegando neste momento, então sei que o servidor está funcionando.
Eu tenho outros 503 códigos de erro com posts e fica assim não há realmente nenhum padrão. Parece ser em momentos aleatórios e solicitações aleatórias. Mesmo de manhã, quando o servidor não parece estar fazendo nada.
Eu vejo outro padrão no log:
4 VCL_call c recv pass
4 VCL_call c hash
4 Hash c /?id=412
4 VCL_return c hash
4 VCL_call c pass pass
4 FetchError c no backend connection
4 VCL_call c error deliver
4 VCL_call c deliver deliver
Aqui fetcherror diz "sem conexão backend".
Um verão dos FetchErrors no log de hoje:
16 FetchError c http first read error: -1 11 (No error recorded)
5 FetchError c http first read error: -1 11 (No error recorded)
4 FetchError c http first read error: -1 11 (No error recorded)
19 FetchError c http first read error: -1 11 (No error recorded)
5 FetchError c http first read error: -1 11 (No error recorded)
23 FetchError c http first read error: -1 11 (No error recorded)
24 FetchError c http first read error: -1 11 (No error recorded)
16 FetchError c http first read error: -1 11 (No error recorded)
6 FetchError c http first read error: -1 11 (No error recorded)
4 FetchError c http first read error: -1 11 (No error recorded)
5 FetchError c http first read error: -1 11 (No error recorded)
4 FetchError c http first read error: -1 11 (No error recorded)
4 FetchError c http first read error: -1 11 (No error recorded)
22 FetchError c http first read error: -1 11 (No error recorded)
6 FetchError c http first read error: -1 11 (No error recorded)
21 FetchError c http first read error: -1 11 (No error recorded)
26 FetchError c no backend connection
4 FetchError c no backend connection
20 FetchError c http first read error: -1 11 (No error recorded)
39 FetchError c http first read error: -1 11 (No error recorded)
Eu não alterei os valores de tempo limite padrão para o verniz.
Esta é a minha configuração para um dos servidores de backend.
backend xenon {
.host = "192.168.3.187";
.port = "80";
.probe = {
.url = "/health-check/";
.interval = 3s;
.window = 5;
.threshold = 2;
}
}
Estou executando o módulo prefork no apache2 com esta configuração
<IfModule mpm_prefork_module>
StartServers 1
MinSpareServers 2
MaxSpareServers 5
MaxClients 200
MaxRequestsPerChild 75
</IfModule>
e somente arquivos PHP são enviados para o servidor. Todos os outros arquivos estáticos são manipulados pelo Nginx.
Alguma idéia?
------- EDITAR --------------
Mais algumas informações de depuração
Eu executei um debug.health de varnishadm
Backend radon is Healthy
Current states good: 5 threshold: 2 window: 5
Average responsetime of good probes: 0.002560
Oldest Newest
================================================================
4444444444444444444444444444444444444444444444444444444444444444 Good IPv4
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX Good Xmit
RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR Good Recv
HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH Happy
Backend xenon is Healthy
Current states good: 5 threshold: 2 window: 5
Average responsetime of good probes: 0.002760
Oldest Newest
================================================================
4444444444444444444444444444444444444444444444444444444444444444 Good IPv4
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX Good Xmit
RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR Good Recv
HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH Happy
Backend iridium is Healthy
Current states good: 5 threshold: 2 window: 5
Average responsetime of good probes: 0.000849
Oldest Newest
================================================================
4444444444444444444444444444444444444444444444444444444444444444 Good IPv4
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX Good Xmit
RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR Good Recv
HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH Happy
Backend aurum is Healthy
Current states good: 5 threshold: 2 window: 5
Average responsetime of good probes: 0.002100
Oldest Newest
================================================================
4444444444444444444444444444444444444444444444444444444444444444 Good IPv4
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX Good Xmit
RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR Good Recv
HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH Happy
E eu tenho monitorado o varnishstat dos dois balanceadores de carga
3224774 3.99 2.61 backend_conn - Backend conn. success
27 0.00 0.00 backend_unhealthy - Backend conn. not attempted
63 0.00 0.00 backend_fail - Backend conn. failures
358798 0.00 0.29 backend_reuse - Backend conn. reuses
21035 0.00 0.02 backend_toolate - Backend conn. was closed
379834 0.00 0.31 backend_recycle - Backend conn. recycles
26 0.00 0.00 backend_retry - Backend conn. retry
3217751 5.99 2.61 backend_conn - Backend conn. success
32 0.00 0.00 backend_fail - Backend conn. failures
364185 0.00 0.30 backend_reuse - Backend conn. reuses
27077 0.00 0.02 backend_toolate - Backend conn. was closed
391263 0.00 0.32 backend_recycle - Backend conn. recycles
36 0.00 0.00 backend_retry - Backend conn. retry
Observe que nenhum deles relatou backend_fail.
/ Ronnie