cron作业期间,AWS Autoscaling Group EC2实例发生故障 [英] AWS Autoscaling Group EC2 instances go down during cron jobs

查看:69
本文介绍了cron作业期间,AWS Autoscaling Group EC2实例发生故障的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试了自动伸缩组,或者只是尝试了由负载均衡器绑定的一堆EC2实例.乍看之下,这两个配置都可以正常工作.

I tried autoscaling groups and alternatively just a bunch of EC2 instances tied by load balancer. Both configs are working fine at first glance.

但是,当EC2是自动伸缩组的一部分时,有时会下降.实际上,它经常发生,几乎每天一次.并且它们在硬重置"中失败.方法.ec2监视图显示CPU使用率上升到100%,然后实例变得无响应,然后由自动缩放组终止.

But, when the EC2 is a part of autoscaling group it goes down sometimes. Actually it happens very often, almost once a day. And they go down in a "hard reset" way. The ec2 monitoring graphs show that CPU usage goes up to 100%, then the instance become not responsive and then it is terminated by autoscaling group.

与这些实例上的进程无关.

And it has nothing to do with my processes on these instances.

当实例不属于Autoscaling组时,它可以正常工作多年,而CPU使用率不会出现峰值.

When the instance is not a part of Autoscaling groups, it can work without the CPU usage spikes for years.

硬重置"在自动伸缩组实例上制动了我的cron作业.我非常喜欢自动缩放组,因此无法使用它.

The "hard reset" on autoscaling group instances are braking my cron jobs. As much as I like the autoscaling groups I cannot use it.

是否有处理硬重置"的标准方法?

It there a standard way to deal with the "hard resets"?

PS.

在我的情况下,cron作业正在Ubuntu上运行PHP脚本.我设法只使一个实例运行该作业.

The cron jobs are running PHP scripts on Ubuntu in my case. I managed to make only one instance running the job.

推荐答案

听起来好像您的cron运行时运行状况检查失败,结果该实例退出了服务.

It sounds like you have a health check that is failing when your cron is running, as as a result the instance is being taken out of service.

如果您查看ASG,则应列出列出删除该实例的原因.通常这是健康检查失败,但也可能有其他原因.

If you look at the ASG, there should be a reason listed for why the instance was taken out. This will usually be a health check failure, but there could be other reasons as well.

您可以执行以下几项操作来解决此问题.

There are a couple things you can do to fix this.

首先,确定您的cron为什么要占用100%的CPU,以及通常需要多长时间.

First, determine why your cron is taking 100% of CPU, and how long it generally takes.

查看您的健康检查设置.您使用的是HTTP还是TCP?间隔是多少?停止服务前必须失败多少次检查?

Review your health check settings. Are you using HTTP or TCP? What is the interval, and how many checks have to fail before it is taken out of service?

在这两项之间,您应该能够调整运行状况检查,以使其在cron运行期间不会停止运行.实例可能失败,通常是因为实例内存不足.如果是这种情况,您可能要考虑使用大型实例类型和/或启用交换.

Between those two items, you should be able to adjust the health checks so that it doesn't take it out of service during the cron running time. It is possible that the instance is failing, typically this would be because it runs out of memory. If that is the case, you may want to consider going to a large instance type and/or enabling swap.

这篇关于cron作业期间,AWS Autoscaling Group EC2实例发生故障的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆