Google Compute Engine运行状况检查失败 [英] Google Compute Engine health checks failing

查看:380
本文介绍了Google Compute Engine运行状况检查失败的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在两个VM实例上有一个node.js应用程序,我正在尝试通过网络负载平衡实现负载平衡.为了测试服务器是否正常运行,我在应用程序的内部侦听端口上具有运行状况检查请求"/health.txt".我为两个实例配置了相同的标签,防火墙规则等,但是运行状况检查无法连续对一个实例进行检查,我可以使用内部网络上的curl进行检查,也可以从外部进行检查,并且两个实例的测试都可以正常进行,但是网络负载平衡器始终将一个实例报告为宕机.

I have a node.js app on two VM instances that I'm trying to load balance with network load balancing. To test that my servers are up and serving, I have the health check request '/health.txt' on my app internal listening port. I have two instances configured identically with the same tags, firewall rules, etc, but the health check fails to one instance continuously, I can do the check using curl on my internal network or from outside and the test works fine on both instances, but the network load balancer always reports one instance as down.

我使用ngrep并从运行状况实例运行,我看到:

I used ngrep and running from the health instance, I see:

T 169.254.169.254:65374 -> my.pub.ip.addr:3000 [S]
#
T my.pub.ip.addr:3000 -> 169.254.169.254:65374 [AS]
#
T 169.254.169.254:65374 -> my.pub.ip.addr:3000 [A]
#
T 169.254.169.254:65374 -> my.pub.ip.addr:3000 [AP]
GET /health.txt HTTP/1.1.
Host: my.pub.ip.addr:3000.
.

#
T my.pub.ip.addr:3000 -> 169.254.169.254:65374 [A]
#
T my.pub.ip.addr:3000 -> 169.254.169.254:65374 [AP]
HTTP/1.1 200 OK.
X-Powered-By: NitroPCR.
Accept-Ranges: bytes.
Date: Fri, 14 Nov 2014 20:00:40 GMT.
Cache-Control: public, max-age=86400.
Last-Modified: Thu, 24 Jul 2014 17:58:46 GMT.
ETag: W/"2198506076".
Content-Type: text/plain; charset=UTF-8.
Content-Length: 13.
Connection: keep-alive.
.

#
T 169.254.169.254:65374 -> my.pub.ip.addr:3000 [AR]

但是在GCE声称不健康的情况下,我看到了:

But on the instance GCE claims is unhealthy, I see this:

T 169.254.169.254:61179 -> my.pub.ip.addr:3000 [S]
#
T 169.254.169.254:61179 -> my.pub.ip.addr:3000 [S]
#
T 169.254.169.254:61180 -> my.pub.ip.addr:3000 [S]
#
T 169.254.169.254:61180 -> my.pub.ip.addr:3000 [S]
#
T 169.254.169.254:61180 -> my.pub.ip.addr:3000 [S]

但是如果我从健康实例>不健康实例中卷曲相同的文件,则我的不健康"实例响应良好.

But if I curl the same file from my healthy instance > unhealthy instance, my 'unhealthy' instance responds fine.

推荐答案

与Google Compute Engine小组联系后,我重新开始工作. GCE VM上有一个服务进程,需要在启动时运行,并在VM处于活动状态时继续运行.该过程名为google-address-manager.它应该在运行级别0-6上运行.由于某些原因,该服务已停止,并且在我的其中一个VM引导/重新引导时将无法启动.手动启动服务.以下是我们确定错误所在的步骤:(这是Debian VM)

I got this back working, after making contact with the Google Compute Engine team. There is a service process on a GCE VM that needs to run on boot, and continue running while the VM is alive. The process is named google-address-manager. It should run at runlevels 0-6. For some reason this service stopped and will not start when one of my VMs boots/reboots. Starting the service manually worked. Here are the steps we went through to determine what was wrong: (This is a Debian VM)

sudo ip route list table all

这将显示您的路线表.在表格中,应该有一条通往您的负载均衡器公共IP的路由:

This will display your route table. In the table, there should be a route to your Load Balancer Public IP:

local lb.pub.ip.addr dev eth0  table local  proto 66  scope host

如果没有,请检查google-address-manager是否正在运行:

If there is not, check that google-address-manager is running:

sudo service google-address-manager status

如果它没有运行,请启动它:

If it not running, start it:

sudo service google-address-manager start

如果一切正常,请检查您的路由表,现在您应该有一条通往负载均衡器IP的路由.您也可以手动添加以下路线:

If it starts ok, check your route table, and you should now have a route to your load balancer IP. You can also manually add this route:

sudo /sbin/ip route add to local lb.pub.ip.addr/32 dev eth0 proto 66

我们仍然没有解决地址管理器为何停止并且无法在启动时启动的问题,但至少LB池是健康的

We have still not resolved why the address manager stopped and does not start on boot, but at least the LB Pool is healthy

这篇关于Google Compute Engine运行状况检查失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆