O aplicativo da AWS, por vezes, responde a 502 sem chamar o backend

1

Temos a seguinte configuração: Um aplicativo da AWS O ELB tem um grupo-alvo de 4 instâncias do EC2. Em cada uma das instâncias do EC2, há um Apache2 que encaminha para um Tomcat.

Funciona brilhantemente. Exceto às vezes, quando o ELB aparentemente se recusa a falar com uma instância saudável e responde 502 em vez disso. Isso é o que parece no log de acesso do ELB:

https 2018-03-25T09:36:13.797244Z app/XXXXXX 178.162.220.236:50254 172.31.24.81:1443 0.001 0.031 0.000 204 204 374 210 "POST https://XXXXXX.com:443/api/event?apikey=XXXXXX HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2 arn:aws:elasticloadbalancing:eu-central-1:709725828154:targetgroup/XXXXXX "Root=1-5ab76d8d-2f54c360354a57532ccc55ef" "XXXXXX.com" "arn:aws:acm:eu-central-1:709725828154:certificate/91f68487-7f5d-4ed5-80b9-4a9839ebe0f1" 0
https 2018-03-25T09:36:14.169090Z app/XXXXXX 178.162.220.236:50270 172.31.22.59:1443 0.001 0.096 0.000 204 204 374 210 "POST https://XXXXXX.com:443/api/event?apikey=XXXXXX HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2 arn:aws:elasticloadbalancing:eu-central-1:709725828154:targetgroup/XXXXXX "Root=1-5ab76d8e-748fc1e34bee608baa1efa81" "XXXXXX.com" "arn:aws:acm:eu-central-1:709725828154:certificate/91f68487-7f5d-4ed5-80b9-4a9839ebe0f1" 0
https 2018-03-25T09:36:18.785982Z app/XXXXXX 178.162.220.236:50330 172.31.19.151:1443 0.001 0.034 0.000 204 204 374 210 "POST https://XXXXXX.com:443/api/event?apikey=XXXXXX HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2 arn:aws:elasticloadbalancing:eu-central-1:709725828154:targetgroup/XXXXXX "Root=1-5ab76d92-34ac7ac52b69c34fdfdbce33" "XXXXXX.com" "arn:aws:acm:eu-central-1:709725828154:certificate/91f68487-7f5d-4ed5-80b9-4a9839ebe0f1" 0
https 2018-03-25T09:36:28.457341Z app/XXXXXX 178.162.220.236:50808 172.31.28.152:1443 0.001 0.075 0.000 204 204 374 210 "POST https://XXXXXX.com:443/api/event?apikey=XXXXXX HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2 arn:aws:elasticloadbalancing:eu-central-1:709725828154:targetgroup/XXXXXX "Root=1-5ab76d9c-66b8f0a5c6be9ff6c098dfe4" "XXXXXX.com" "arn:aws:acm:eu-central-1:709725828154:certificate/91f68487-7f5d-4ed5-80b9-4a9839ebe0f1" 0
https 2018-03-25T09:36:30.615879Z app/XXXXXX 178.162.220.236:50900 172.31.24.81:1443 0.001 0.032 0.000 204 204 314 210 "POST https://XXXXXX.com:443/api/event?apikey=XXXXXX HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2 arn:aws:elasticloadbalancing:eu-central-1:709725828154:targetgroup/XXXXXX "Root=1-5ab76d9e-d1d310c5d55ad09295bb08b9" "XXXXXX.com" "arn:aws:acm:eu-central-1:709725828154:certificate/91f68487-7f5d-4ed5-80b9-4a9839ebe0f1" 0
https 2018-03-25T09:36:30.955285Z app/XXXXXX 178.162.220.236:50914 172.31.19.151:1443 0.001 0.091 0.000 204 204 374 210 "POST https://XXXXXX.com:443/api/event?apikey=XXXXXX HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2 arn:aws:elasticloadbalancing:eu-central-1:709725828154:targetgroup/XXXXXX "Root=1-5ab76d9e-079416df92b2cd27865c8b93" "XXXXXX.com" "arn:aws:acm:eu-central-1:709725828154:certificate/91f68487-7f5d-4ed5-80b9-4a9839ebe0f1" 0
https 2018-03-25T09:36:36.843389Z app/XXXXXX 178.162.220.236:51090 172.31.28.152:1443 0.001 0.081 0.000 204 204 374 210 "POST https://XXXXXX.com:443/api/event?apikey=XXXXXX HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2 arn:aws:elasticloadbalancing:eu-central-1:709725828154:targetgroup/XXXXXX "Root=1-5ab76da4-917b032b1fb8c1c0dc914354" "XXXXXX.com" "arn:aws:acm:eu-central-1:709725828154:certificate/91f68487-7f5d-4ed5-80b9-4a9839ebe0f1" 0
https 2018-03-25T09:36:40.432293Z app/XXXXXX 178.162.220.236:51184 172.31.24.81:1443 0.002 0.076 0.000 204 204 374 210 "POST https://XXXXXX.com:443/api/event?apikey=XXXXXX HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2 arn:aws:elasticloadbalancing:eu-central-1:709725828154:targetgroup/XXXXXX "Root=1-5ab76da8-1ee52e1fa688cc81071fc8d1" "XXXXXX.com" "arn:aws:acm:eu-central-1:709725828154:certificate/91f68487-7f5d-4ed5-80b9-4a9839ebe0f1" 0
https 2018-03-25T09:36:41.398389Z app/XXXXXX 178.162.220.236:51208 172.31.19.151:1443 0.001 0.033 0.000 204 204 374 210 "POST https://XXXXXX.com:443/api/event?apikey=XXXXXX HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2 arn:aws:elasticloadbalancing:eu-central-1:709725828154:targetgroup/XXXXXX "Root=1-5ab76da9-2ee889d47ead389675818ea6" "XXXXXX.com" "arn:aws:acm:eu-central-1:709725828154:certificate/91f68487-7f5d-4ed5-80b9-4a9839ebe0f1" 0
https 2018-03-25T09:36:41.775607Z app/XXXXXX 178.162.220.236:51216 172.31.28.152:1443 0.000 0.000 -1 502 - 314 293 "POST https://XXXXXX.com:443/api/event?apikey=XXXXXX HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2 arn:aws:elasticloadbalancing:eu-central-1:709725828154:targetgroup/XXXXXX "Root=1-5ab76da9-fad6ea5fa8cc956c24d65206" "XXXXXX.com" "arn:aws:acm:eu-central-1:709725828154:certificate/91f68487-7f5d-4ed5-80b9-4a9839ebe0f1" 0
https 2018-03-25T09:36:46.954552Z app/XXXXXX 178.162.220.236:50912 172.31.22.59:1443 0.001 16.118 0.000 204 204 293 210 "POST https://XXXXXX.com:443/api/event?apikey=XXXXXX HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2 arn:aws:elasticloadbalancing:eu-central-1:709725828154:targetgroup/XXXXXX "Root=1-5ab76d9e-9c0827e71b334dd908000419" "XXXXXX.com" "arn:aws:acm:eu-central-1:709725828154:certificate/91f68487-7f5d-4ed5-80b9-4a9839ebe0f1" 0
https 2018-03-25T09:36:50.750077Z app/XXXXXX 178.162.220.236:51188 172.31.22.59:1443 0.001 10.096 0.000 204 204 374 210 "POST https://XXXXXX.com:443/api/event?apikey=XXXXXX HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2 arn:aws:elasticloadbalancing:eu-central-1:709725828154:targetgroup/XXXXXX "Root=1-5ab76da8-f99e3281d1e0e2a73c7c4ba9" "XXXXXX.com" "arn:aws:acm:eu-central-1:709725828154:certificate/91f68487-7f5d-4ed5-80b9-4a9839ebe0f1" 0
https 2018-03-25T09:36:52.336104Z app/XXXXXX 178.162.220.236:51510 172.31.24.81:1443 0.001 0.115 0.000 204 204 374 210 "POST https://XXXXXX.com:443/api/event?apikey=XXXXXX HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2 arn:aws:elasticloadbalancing:eu-central-1:709725828154:targetgroup/XXXXXX "Root=1-5ab76db4-2640adae88a8639f196613e9" "XXXXXX.com" "arn:aws:acm:eu-central-1:709725828154:certificate/91f68487-7f5d-4ed5-80b9-4a9839ebe0f1" 0
https 2018-03-25T09:36:55.563397Z app/XXXXXX 178.162.220.236:51604 172.31.22.59:1443 0.000 0.104 0.000 204 204 374 210 "POST https://XXXXXX.com:443/api/event?apikey=XXXXXX HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2 arn:aws:elasticloadbalancing:eu-central-1:709725828154:targetgroup/XXXXXX "Root=1-5ab76db7-ecff4da4c9414383ce128eae" "XXXXXX.com" "arn:aws:acm:eu-central-1:709725828154:certificate/91f68487-7f5d-4ed5-80b9-4a9839ebe0f1" 0
https 2018-03-25T09:36:58.824749Z app/XXXXXX 178.162.220.236:51682 172.31.19.151:1443 0.001 0.034 0.000 204 204 374 210 "POST https://XXXXXX.com:443/api/event?apikey=XXXXXX HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2 arn:aws:elasticloadbalancing:eu-central-1:709725828154:targetgroup/XXXXXX "Root=1-5ab76dba-425a808b5efc4c3cb4cb7f53" "XXXXXX.com" "arn:aws:acm:eu-central-1:709725828154:certificate/91f68487-7f5d-4ed5-80b9-4a9839ebe0f1" 0
https 2018-03-25T09:37:09.004924Z app/XXXXXX 178.162.220.236:51994 172.31.28.152:1443 0.001 0.135 0.000 204 204 655 210 "POST https://XXXXXX.com:443/api/event?apikey=XXXXXX HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2 arn:aws:elasticloadbalancing:eu-central-1:709725828154:targetgroup/XXXXXX "Root=1-5ab76dc4-5d8dce64e6fcb2aa069e9018" "XXXXXX.com" "arn:aws:acm:eu-central-1:709725828154:certificate/91f68487-7f5d-4ed5-80b9-4a9839ebe0f1" 0
https 2018-03-25T09:36:11.146813Z app/XXXXXX 178.162.220.236:50212 172.31.24.81:1443 0.001 0.031 0.000 204 204 374 210 "POST https://XXXXXX.com:443/api/event?apikey=XXXXXX HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2 arn:aws:elasticloadbalancing:eu-central-1:709725828154:targetgroup/XXXXXX "Root=1-5ab76d8b-ce30a6a810a89110465a9944" "XXXXXX.com" "arn:aws:acm:eu-central-1:709725828154:certificate/91f68487-7f5d-4ed5-80b9-4a9839ebe0f1" 0

Observe o 502 como elb_status_code na 10ª linha. Observe também o -1 como target_status_code .

Não há absolutamente nenhum vestígio dessa solicitação nem nos logs do apache2 nem nos logs do tomcat. Todas as quatro instâncias permanecem saudáveis por toda parte. O mesmo IP (172.31.28.152 neste caso) processa corretamente as solicitações antes e depois.

O sistema de envio está sob nosso controle. Todas as solicitações são uniformes, por isso, algumas solicitações malformadas não são processadas corretamente. Todos parecem iguais.

Alguma idéia de por que isso pode acontecer? Eu suponho que a AWS não venda balanceadores de carga defeituosos ...

    
por Jan Doerrenhaus 24.04.2018 / 17:31

1 resposta

0

A solução estava nos comentários. O ELB tem uma conexão padrão KeepAlive de 60 segundos. Apache2 tem uma conexão padrão KeepAlive de 5 segundos. Se os 5 segundos acabarem, o Apache2 fecha sua conexão e redefine a conexão com o ELB. No entanto, se uma solicitação chegar no momento exato, o ELB aceitará, decidirá para qual host a encaminhar e, nesse momento, o Apache fecha a conexão. Isso resultará no dito código de erro 502.

A solução é: Quando você tem proxies / LBs em cascata, alinhe seus tempos limite do KeepAlive, ou - preferivelmente - até mesmo os deixe um pouco mais longos na linha que você tem.

Definimos o tempo limite do ELB para 60 segundos e o tempo limite do Apache2 para 120 segundos. Problema desaparecido.

    
por 04.05.2018 / 13:53