Diagnosticando gargalo ou limite em haproxy

1

Tenho haproxy na frente de um aplicativo da web de alto desempenho. Em volumes altos ~ 11k req / seg, o servidor parece atingir um limite ou gargalo no nível haproxy, enquanto o servidor subjacente pode manipular mais tráfego com latência de resposta próxima a zero.

Para ilustrar, executo o curl em um endpoint de teste de hello simples. Acertar o aplicativo diretamente na porta 8080 responde em ~ 17ms de acordo com time . Eu suspeito que devido a uma configuração de referência tão bruta, o tempo de resposta real é inferior a 1 ms. Passando pela porta haproxy 80 leva mais de 5 segundos. Eu suponho que há algum limite de fila / backlog que eu preciso ajustar:

root@01:/usr/share# time curl http://localhost/hello
hi there!
real    0m5.097s
user    0m0.008s
sys 0m0.012s
root@01:/usr/share# time curl http://localhost:8080/hello
hi there!
real    0m0.017s
user    0m0.012s
sys 0m0.000s

executando curl "http://localhost:9000/haproxy_stats;csv" > stats.csv , obtenho:

# pxname,svname,qcur,qmax,scur,smax,slim,stot,bin,bout,dreq,dresp,ereq,econ,eresp,wretr,wredis,status,weight,act,bck,chkfail,chkdown,lastchg,downtime,qlimit,pid,iid,sid,throttle,lbtot,tracked,type,rate,rate_lim,rate_max,check_status,check_code,check_duration,hrsp_1xx,hrsp_2xx,hrsp_3xx,hrsp_4xx,hrsp_5xx,hrsp_other,hanafail,req_rate,req_rate_max,req_tot,cli_abrt,srv_abrt,comp_in,comp_out,comp_byp,comp_rsp,lastsess,last_chk,last_agt,qtime,ctime,rtime,ttime,
http-in,FRONTEND,,,30000,30000,30000,244107765,170146478342,37776067415,0,0,49255,,,,,OPEN,,,,,,,,,1,2,0,,,,0,10606,0,25873,,,,0,409162321,0,51319,245288,4921,,12052,26301,409479925,,,0,0,0,0,,,,,,,,
servers,server1,0,0,16076,29017,50000,410260473,170146023052,37766853705,,0,,244621,4917,829803,0,no check,1,1,0,,,,,,1,3,1,,409430670,,2,12052,,34968,,,,0,409162321,0,1794,667,0,0,,,,4520,4917,,,,,0,,,0,690,646,1992,
servers,BACKEND,0,0,16076,29964,3000,409430670,170146023052,37766853705,0,0,,244621,4917,829803,0,UP,1,1,0,,0,45940,0,,1,3,0,,409430670,,1,12052,,26301,,,,0,409162321,0,2064,245288,4921,,,,,4520,4917,0,0,0,0,0,,,0,690,646,1992,
stats,FRONTEND,,,2,2,2000,772,86236,1141909,0,0,1,,,,,OPEN,,,,,,,,,1,4,0,,,,0,1,0,1,,,,0,770,0,1,0,0,,1,2,772,,,0,0,0,0,,,,,,,,
stats,BACKEND,0,0,0,0,200,0,86236,1141909,0,0,,0,0,0,0,UP,0,0,0,,0,45940,0,,1,4,0,,0,,1,0,,0,,,,0,0,0,0,0,0,,,,,0,0,0,0,0,0,0,,,3663,0,2,317,

Reformatado em texto tabular mais fácil de ler (UPDATE: formatação fixa):

# pxname  svname    qcur  qmax  scur   smax   slim   stot       bin           bout         dreq  dresp  ereq   econ    eresp  wretr    wredis  status    weight  act  bck  chkfail  chkdown  lastchg  downtime  qlimit  pid  iid  sid  throttle  lbtot      tracked  type  rate   rate_lim  rate_max  check_status  check_code  check_duration  hrsp_1xx  hrsp_2xx   hrsp_3xx  hrsp_4xx  hrsp_5xx  hrsp_other  hanafail  req_rate  req_rate_max  req_tot    cli_abrt  srv_abrt  comp_in  comp_out  comp_byp  comp_rsp  lastsess  last_chk  last_agt  qtime  ctime  rtime  ttime  
http-in   FRONTEND              29930  30000  30000  260607375  176871559870  39278101192  0     0      50446                                  OPEN                                                                     1    2    0                                  0     11565  0         25873                                               0         425740885  0         52573     286642    5198                  10112     26301         426097360                      0        0         0         0                                                                   
servers   server1   0     0     12061  29020  50000  427063733  176871103092  39268664952        0             285733  5194   1017043  0       no check  1       1    0                                                 1    3    1              426046914           2     10110            34968                                               0         425740885  0         1858      909       0           0                                            5158      5194                                             0                             0      589    664    1931   
servers   BACKEND   0     0     12061  29964  3000   426046914  176871103092  39268664952  0     0             285733  5194   1017043  0       UP        1       1    0             0        47413    0                 1    3    0              426046914           1     10110            26301                                               0         425740885  0         2128      286642    5198                                                     5158      5194      0        0         0         0         0                             0      589    664    1931   
stats     FRONTEND              1      2      2000   798        89114         1181075      0     0      1                                      OPEN                                                                     1    4    0                                  0     1      0         1                                                   0         796        0         1         0         0                     1         2             798                            0        0         0         0                                                                   
stats     BACKEND   0     0     0      0      200    0          89114         1181075      0     0             0       0      0        0       UP        0       0    0             0        47413    0                 1    4    0              0                   1     0                0                                                   0         0          0         0         0         0                                                        0         0         0        0         0         0         0                             3481   0      2      334    

O único valor que parece suspeito é um slim de 3000 para servidores / BACKEND. Como posso ajustar isso na configuração? Os servidores principais / server1 mostram um slim de 50000 que está confortavelmente acima dos valores scur e smax .

Meu arquivo haproxy.cfg:

global
    daemon
    maxconn 300000
    # See running maxconn with:
    # echo "show info" | socat /var/run/haproxy.sock stdio
    stats socket /var/run/haproxy.sock mode 600 level admin
    stats timeout 2m

defaults
    # This doesn't help.
    # maxconn 25000
    mode http
    timeout connect 5000ms
    timeout client 50000ms
    timeout server 50000ms

frontend http-in
    bind *:80
    maxconn 30000
    default_backend servers

backend servers
    # This doesn't help.
    # maxconn 25000
    server server1 127.0.0.1:8080 maxconn 50000
    stats enable

listen stats
    bind :9000
    mode http
    stats enable
    stats uri /haproxy_stats
    
por clay 31.05.2018 / 20:15

0 respostas