Tenho haproxy na frente de um aplicativo da web de alto desempenho. Em volumes altos ~ 11k req / seg, o servidor parece atingir um limite ou gargalo no nível haproxy, enquanto o servidor subjacente pode manipular mais tráfego com latência de resposta próxima a zero.
Para ilustrar, executo o curl em um endpoint de teste de hello simples. Acertar o aplicativo diretamente na porta 8080 responde em ~ 17ms de acordo com time
. Eu suspeito que devido a uma configuração de referência tão bruta, o tempo de resposta real é inferior a 1 ms. Passando pela porta haproxy 80 leva mais de 5 segundos. Eu suponho que há algum limite de fila / backlog que eu preciso ajustar:
root@01:/usr/share# time curl http://localhost/hello
hi there!
real 0m5.097s
user 0m0.008s
sys 0m0.012s
root@01:/usr/share# time curl http://localhost:8080/hello
hi there!
real 0m0.017s
user 0m0.012s
sys 0m0.000s
executando curl "http://localhost:9000/haproxy_stats;csv" > stats.csv
, obtenho:
# pxname,svname,qcur,qmax,scur,smax,slim,stot,bin,bout,dreq,dresp,ereq,econ,eresp,wretr,wredis,status,weight,act,bck,chkfail,chkdown,lastchg,downtime,qlimit,pid,iid,sid,throttle,lbtot,tracked,type,rate,rate_lim,rate_max,check_status,check_code,check_duration,hrsp_1xx,hrsp_2xx,hrsp_3xx,hrsp_4xx,hrsp_5xx,hrsp_other,hanafail,req_rate,req_rate_max,req_tot,cli_abrt,srv_abrt,comp_in,comp_out,comp_byp,comp_rsp,lastsess,last_chk,last_agt,qtime,ctime,rtime,ttime,
http-in,FRONTEND,,,30000,30000,30000,244107765,170146478342,37776067415,0,0,49255,,,,,OPEN,,,,,,,,,1,2,0,,,,0,10606,0,25873,,,,0,409162321,0,51319,245288,4921,,12052,26301,409479925,,,0,0,0,0,,,,,,,,
servers,server1,0,0,16076,29017,50000,410260473,170146023052,37766853705,,0,,244621,4917,829803,0,no check,1,1,0,,,,,,1,3,1,,409430670,,2,12052,,34968,,,,0,409162321,0,1794,667,0,0,,,,4520,4917,,,,,0,,,0,690,646,1992,
servers,BACKEND,0,0,16076,29964,3000,409430670,170146023052,37766853705,0,0,,244621,4917,829803,0,UP,1,1,0,,0,45940,0,,1,3,0,,409430670,,1,12052,,26301,,,,0,409162321,0,2064,245288,4921,,,,,4520,4917,0,0,0,0,0,,,0,690,646,1992,
stats,FRONTEND,,,2,2,2000,772,86236,1141909,0,0,1,,,,,OPEN,,,,,,,,,1,4,0,,,,0,1,0,1,,,,0,770,0,1,0,0,,1,2,772,,,0,0,0,0,,,,,,,,
stats,BACKEND,0,0,0,0,200,0,86236,1141909,0,0,,0,0,0,0,UP,0,0,0,,0,45940,0,,1,4,0,,0,,1,0,,0,,,,0,0,0,0,0,0,,,,,0,0,0,0,0,0,0,,,3663,0,2,317,
Reformatado em texto tabular mais fácil de ler (UPDATE: formatação fixa):
# pxname svname qcur qmax scur smax slim stot bin bout dreq dresp ereq econ eresp wretr wredis status weight act bck chkfail chkdown lastchg downtime qlimit pid iid sid throttle lbtot tracked type rate rate_lim rate_max check_status check_code check_duration hrsp_1xx hrsp_2xx hrsp_3xx hrsp_4xx hrsp_5xx hrsp_other hanafail req_rate req_rate_max req_tot cli_abrt srv_abrt comp_in comp_out comp_byp comp_rsp lastsess last_chk last_agt qtime ctime rtime ttime
http-in FRONTEND 29930 30000 30000 260607375 176871559870 39278101192 0 0 50446 OPEN 1 2 0 0 11565 0 25873 0 425740885 0 52573 286642 5198 10112 26301 426097360 0 0 0 0
servers server1 0 0 12061 29020 50000 427063733 176871103092 39268664952 0 285733 5194 1017043 0 no check 1 1 0 1 3 1 426046914 2 10110 34968 0 425740885 0 1858 909 0 0 5158 5194 0 0 589 664 1931
servers BACKEND 0 0 12061 29964 3000 426046914 176871103092 39268664952 0 0 285733 5194 1017043 0 UP 1 1 0 0 47413 0 1 3 0 426046914 1 10110 26301 0 425740885 0 2128 286642 5198 5158 5194 0 0 0 0 0 0 589 664 1931
stats FRONTEND 1 2 2000 798 89114 1181075 0 0 1 OPEN 1 4 0 0 1 0 1 0 796 0 1 0 0 1 2 798 0 0 0 0
stats BACKEND 0 0 0 0 200 0 89114 1181075 0 0 0 0 0 0 UP 0 0 0 0 47413 0 1 4 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3481 0 2 334
O único valor que parece suspeito é um slim
de 3000 para servidores / BACKEND. Como posso ajustar isso na configuração? Os servidores principais / server1 mostram um slim
de 50000
que está confortavelmente acima dos valores scur
e smax
.
Meu arquivo haproxy.cfg:
global
daemon
maxconn 300000
# See running maxconn with:
# echo "show info" | socat /var/run/haproxy.sock stdio
stats socket /var/run/haproxy.sock mode 600 level admin
stats timeout 2m
defaults
# This doesn't help.
# maxconn 25000
mode http
timeout connect 5000ms
timeout client 50000ms
timeout server 50000ms
frontend http-in
bind *:80
maxconn 30000
default_backend servers
backend servers
# This doesn't help.
# maxconn 25000
server server1 127.0.0.1:8080 maxconn 50000
stats enable
listen stats
bind :9000
mode http
stats enable
stats uri /haproxy_stats