GCP HTTP负载平衡中的一些502错误 [英] Some 502 errors in GCP HTTP Load Balancing
问题描述
对于某些请求,我们的负载平衡器返回502错误.这仅占请求总数的很小一部分,我们每小时大约有36000个请求,每小时大约有40个错误,因此只有0.01%的请求返回错误.
Our load balancer is returning 502 errors for some requests. It is just a very low percentage of the total requests, we have around 36000 request per hour and about 40 errors per hour, so just a 0,01% of the requests returns an error.
发生错误时实例是正常的,并且我们已将此转发规则添加到负载均衡器的防火墙中: 130.211.0.0/22 tcp:1-5000应用于所有目标
The instances are healthy when the error occurs and we have added this forwarding rule to the firewall for the load balancer: 130.211.0.0/22 tcp:1-5000 Apply to all targets
这不是一个很严重的问题,因为应用程序可以容忍此类错误,但是我想知道为什么会给出这些错误.
It is not a very serious problem because the application tolerates such errors, but I would like to know why they are given.
任何帮助将不胜枚举.
推荐答案
似乎没有简单的解决方案.
It seems that there are no an easy solution for this.
正如Mike Fotinakis在此博客(感谢您提供JasonG的信息):
As Mike Fotinakis explains in this blog (thank you for this info JasonG :)):
事实证明,Google Cloud HTTP(S)负载平衡器与NGINX的默认保持活动超时时间为65秒之间存在竞争.在负载均衡器尝试将连接重新用于另一个HTTP请求的同时,可能会达到NGINX超时,这会断开连接并导致负载均衡器产生502 Bad Gateway响应.
It turns out that there is a race condition between the Google Cloud HTTP(S) Load Balancer and NGINX’s default keep-alive timeout of 65 seconds. The NGINX timeout might be reached at the same time the load balancer tries to re-use the connection for another HTTP request, which breaks the connection and results in a 502 Bad Gateway response from the load balancer.
在我的情况下,我将Apache与mpm_prefork模块一起使用.提出的解决方案是将连接的保活超时增加到650s,但这是不可能的,因为每个连接都会打开一个新进程(这样会浪费大量资源).
In my case I'm using Apache with the mpm_prefork module. The solution proposed is to increase the connection keepalive timeout to 650s, but this is not possible because each connection opens one new process (so this would represent a great waste of resources).
更新:
似乎在官方负载均衡器文档页面上有一些有关此问题的新文档(搜索超时和重试"):
UPDATE:
It seems that there are some new documentation about this problem on the official load balancer documentation page (search for "Timeouts and retries"): https://cloud.google.com/compute/docs/load-balancing/http/
在两种情况下(Apache和Nginx),他们都建议将KeepAliveTimeout值设置为620.
They recommend to set the KeepAliveTimeout value to 620 in both cases (Apache and Nginx).
这篇关于GCP HTTP负载平衡中的一些502错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!