NLB 目标组运行状况检查失控 [英] NLB Target Group health checks are out of control

查看:33
本文介绍了NLB 目标组运行状况检查失控的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个网络负载均衡器和一个关联的目标组,该目标组配置为对 EC2 实例进行运行状况检查.问题是我看到大量的健康检查请求;每秒多个.

这是模板的相关部分,我尝试更改注释掉的间隔:

NLB:类型:AWS::ElasticLoadBalancingV2::LoadBalancer"特性:类型:网络名称:api-负载平衡器方案:内部子网:- Fn::ImportValue: PrivateSubnetA- Fn::ImportValue: PrivateSubnetB- Fn::ImportValue: PrivateSubnetCNLB监听器:类型:AWS::ElasticLoadBalancingV2::Listener特性:默认操作:- 类型:转发TargetGroupArn: !Ref NLBTargetGroupLoadBalancerArn: !Ref NLB端口:80协议:TCPNLB 目标组:类型:AWS::ElasticLoadBalancingV2::TargetGroup特性:# HealthCheckIntervalSeconds: 30健康检查路径:/healthcheck健康检查协议:HTTP# HealthyThresholdCount: 2# UnhealthyThresholdCount: 5#匹配器:# HttpCode: 200-399名称:api-nlb-http-target-group端口:80协议:TCPVpcId: !ImportValue PublicVPC

我的 EC2 实例位于私有子网中,无法从外部访问.NLB 是内部的,因此不通过 API 网关就无法访问它们.API Gateway 没有配置 /healthcheck 端点,因此排除了来自 AWS 网络外部的任何活动,例如人们手动 ping 端点.

这是从 CloudWatch 中获取的我的应用程序日志示例,而应用程序应该处于空闲状态:

07:45:33 {"label":"Received request URL","value":"/healthcheck","type":"trace"}07:45:33 {"label":"Received request URL","value":"/healthcheck","type":"trace"}07:45:33 {"label":"Received request URL","value":"/healthcheck","type":"trace"}07:45:33 {"label":"Received request URL","value":"/healthcheck","type":"trace"}07:45:34 {"label":"Received request URL","value":"/healthcheck","type":"trace"}07:45:34 {"label":"Received request URL","value":"/healthcheck","type":"trace"}07:45:34 {"label":"Received request URL","value":"/healthcheck","type":"trace"}07:45:35 {"label":"Received request URL","value":"/healthcheck","type":"trace"}07:45:35 {"label":"Received request URL","value":"/healthcheck","type":"trace"}07:45:35 {"label":"Received request URL","value":"/healthcheck","type":"trace"}

我通常每秒收到 3 到 6 个请求,所以我想知道这是否只是网络负载均衡器的工作方式,而 AWS 仍然没有记录(或者我没有找到),否则我将如何解决这个问题.

解决方案

更新:已在相关的 aws 论坛帖子 确认这是网络负载均衡器的正常行为,并引用了它们的分布式特性作为原因.无法配置自定义间隔.目前,文档仍然过时,请另行说明.

<小时>

这要么是 NLB 目标组中的错误,要么是不正确的正常行为 文档.我得出这个结论是因为:

  • 我确认健康检查来自 NLB
  • 控制台上的配置选项是灰色的
    • 推断 AWS 知道或强加了此限制
  • 其他人
  • 也观察到了相同的结果
  • 该文档专门针对网络负载均衡器
  • AWS 文档通常会引导您进行疯狂的追逐

在这种情况下,我认为这可能是被错误记录的正常行为,但除非来自 AWS 的人可以,否则无法验证这一点,并且几乎不可能获得 像这样的问题在 aws 论坛上.

能够配置设置或至少更新文档会很有用.

I have a Network Load Balancer and an associated Target Group that is configured to do health checks on the EC2 instances. The problem is that I am seeing a very high number of health check requests; multiple every second.

The default interval between checks is supposed to be 30 seconds, but they are coming about 100x more frequently than they should.

My stack is built in CloudFormation, and I've tried overriding HealthCheckIntervalSeconds, which has no effect. Interestingly, when I tried to manually change the interval in the console, I found those values greyed out:

Here is the relevant part of the template, with my attempt at changing the interval commented out:

NLB:
  Type: "AWS::ElasticLoadBalancingV2::LoadBalancer"
  Properties:
    Type: network
    Name: api-load-balancer
    Scheme: internal
    Subnets: 
      - Fn::ImportValue: PrivateSubnetA
      - Fn::ImportValue: PrivateSubnetB
      - Fn::ImportValue: PrivateSubnetC

NLBListener:
  Type : AWS::ElasticLoadBalancingV2::Listener
  Properties:
    DefaultActions:
      - Type: forward
        TargetGroupArn: !Ref NLBTargetGroup
    LoadBalancerArn: !Ref NLB
    Port: 80
    Protocol: TCP

NLBTargetGroup:
  Type: AWS::ElasticLoadBalancingV2::TargetGroup
  Properties:
    # HealthCheckIntervalSeconds: 30
    HealthCheckPath: /healthcheck
    HealthCheckProtocol: HTTP
    # HealthyThresholdCount: 2
    # UnhealthyThresholdCount: 5
    # Matcher:
    #   HttpCode: 200-399
    Name: api-nlb-http-target-group
    Port: 80
    Protocol: TCP 
    VpcId: !ImportValue PublicVPC

My EC2 instances are in private subnets with no access from the outside world. The NLB is internal, so there's no way of accessing them without going through API Gateway. API Gateway has no /healthcheck endpoint configured, so that rules out any activity coming from outside of the AWS network, like people manually pinging the endpoint.

This is a sample of my app's log taken from CloudWatch, while the app should be idle:

07:45:33 {"label":"Received request URL","value":"/healthcheck","type":"trace"}
07:45:33 {"label":"Received request URL","value":"/healthcheck","type":"trace"}
07:45:33 {"label":"Received request URL","value":"/healthcheck","type":"trace"}
07:45:33 {"label":"Received request URL","value":"/healthcheck","type":"trace"}
07:45:34 {"label":"Received request URL","value":"/healthcheck","type":"trace"}
07:45:34 {"label":"Received request URL","value":"/healthcheck","type":"trace"}
07:45:34 {"label":"Received request URL","value":"/healthcheck","type":"trace"}
07:45:35 {"label":"Received request URL","value":"/healthcheck","type":"trace"}
07:45:35 {"label":"Received request URL","value":"/healthcheck","type":"trace"}
07:45:35 {"label":"Received request URL","value":"/healthcheck","type":"trace"}

I'm getting usually 3 to 6 requests every second, so I'm wondering if this is just the way the Network Load Balancers work, and AWS still haven't documented that (or I haven't found it), or otherwise how I might fix this issue.

解决方案

Update: this has been answered on the related aws forum post which confirms that it's normal behaviour for network load balancers and cites their distributed nature as the reason. There is no way to configure a custom interval. At this moment, the docs are still out of date and specify otherwise.


This is either a bug in NLB Target Groups, or normal behaviour with incorrect documentation. I've come to this conclusion because:

  • I verified that the health checks are coming from the NLB
  • The configuration options are greyed out on the console
    • inferring that AWS know about or imposed this limitation
  • The same results are being observed by others
  • The documentation is specifically for Network Load Balancers
  • AWS docs commonly lead you on a wild goose chase

In this case I think it might be normal behaviour that's been documented incorrectly, but there's no way of verifying that unless someone from AWS can, and it's almost impossible to get an answer for an issue like this on the aws forum.

It would be useful to be able to configure the setting, or at least have the docs updated.

这篇关于NLB 目标组运行状况检查失控的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆