Problema
Tempos limite regulares nos back-ends do memcached no Nginx (veja os erros de exemplo abaixo)
Coisas já testadas:
- ocorre nos servidores local e remoto do memcached, portanto, a rede não pode ser o problema
- ocorre com o memc e backends do memcached
- ocorre com e sem o módulo keepalive upstream
- ocorre em nginx 0.8, 1.0.xe 1.1.x
- os registros de buffer parecem reduzir um pouco a quantidade de erros, mas ainda estão lá e isso pode ser simplesmente coincidência
- logar por meio do udp (ou seja, sem bloqueio) para outro não faz diferença
Outras informações:
- o servidor do memcached processa ~ 500 solicitações por segundo deste e de outros clientes (remotos)
- não há erros / relatórios nos registros do memcached
- o memcached é iniciado com as seguintes configurações:
- o tcp e o udp estão ativados (somente o tcp é usado atualmente)
- 16 GB de memória
- 16 encadeamentos (em uma máquina de 32 núcleos)
- limitado a 8192 conexões
- começou no modo não daemônico por meio do supervisord
- Os servidores executam o FreeBSD 8.2
- Os tempos limite ocorrem durante cargas baixas e altas (agora apenas cerca de 50 solicitações por segundo)
A questão
Então, minha pergunta é ... como posso depurar esses problemas? Alguém tem uma ideia de onde a causa subjacente poderia ser? Os erros não ocorrem continuamente, mas com frequência suficiente para aparecer nos registros a cada dois minutos.
Exemplo de erros dos registros:
2011/08/30 17:23:34 [error] 13921#0: *38602 upstream timed out (60: Operation timed out) while reading response header from upstream, client: x.x.x.x, server: some.host.com, request: "GET /some_random_file HTTP/1.1", subrequest: "/memc_get", upstream: "memcached://127.0.0.1:11211", host: "some.host.com"
2011/08/30 17:23:34 [error] 13921#0: *38591 upstream timed out (60: Operation timed out) while reading response header from u'enter code here'pstream, client: x.x.x.x, server: some.host.com, request: "GET /some_random_file HTTP/1.1", subrequest: "/memc_get", upstream: "memcached://127.0.0.1:11211", host: "some.host.com"
2011/08/30 17:23:34 [error] 13921#0: *38601 upstream timed out (60: Operation timed out) while reading response header from upstream, client: x.x.x.x, server: some.host.com, request: "GET /some_random_file HTTP/1.1", subrequest: "/memc_get", upstream: "memcached://127.0.0.1:11211", host: "some.host.com"
2011/08/30 17:23:34 [error] 13921#0: *38918 upstream timed out (60: Operation timed out) while reading response header from upstream, client: x.x.x.x, server: some.host.com, request: "GET /some_random_file HTTP/1.1", subrequest: "/memc_get", upstream: "memcached://127.0.0.1:11211", host: "some.host.com"
2011/08/30 17:23:35 [error] 13921#0: *38595 upstream timed out (60: Operation timed out) while connecting to upstream, client: x.x.x.x, server: some.host.com, request: "GET /some_random_file HTTP/1.1", subrequest: "/memc_get", upstream: "memcached://127.0.0.1:11211", host: "some.host.com"
2011/08/30 17:24:32 [error] 13921#0: *41768 upstream timed out (60: Operation timed out) while connecting to upstream, client: x.x.x.x, server: some.host.com, request: "GET /some_random_file HTTP/1.1", upstream: "memcached://127.0.0.1:11211", host: "some.host.com"
2011/08/30 17:24:36 [error] 13921#0: *38599 upstream timed out (60: Operation timed out) while connecting to upstream, client: x.x.x.x, server: some.host.com, request: "GET /some_random_file HTTP/1.1", subrequest: "/memc_get", upstream: "memcached://127.0.0.1:11211", host: "some.host.com"
2011/08/30 17:25:12 [error] 13921#0: *42489 upstream timed out (60: Operation timed out) while connecting to upstream, client: x.x.x.x, server: some.host.com, request: "GET /some_random_file HTTP/1.1", upstream: "memcached://127.0.0.1:11211", host: "some.host.com"
2011/08/30 17:25:30 [error] 13922#0: *39444 upstream timed out (60: Operation timed out) while reading response header from upstream, client: x.x.x.x, server: some.host.com, request: "GET /some_random_file HTTP/1.1", subrequest: "/memc_get", upstream: "memcached://127.0.0.1:11211", host: "some.host.com"
2011/08/30 17:25:30 [error] 13922#0: *39452 upstream timed out (60: Operation timed out) while reading response header from upstream, client: x.x.x.x, server: some.host.com, request: "GET /some_random_file HTTP/1.1", subrequest: "/memc_get", upstream: "memcached://127.0.0.1:11211", host: "some.host.com"
Saída de netstat -m
# netstat -m
3404/25531/28935 mbufs in use (current/cache/total)
3318/25004/28322/229376 mbuf clusters in use (current/cache/total/max)
1161/20599 mbuf+clusters out of packet secondary zone in use (current/cache)
26/4420/4446/192000 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
7591K/74070K/81661K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/0/0 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
4925075 requests for I/O initiated by sendfile
0 calls to protocol drain routines