ECS服务不稳定 [英] ECS service did not stabilize

查看:110
本文介绍了ECS服务不稳定的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



下班后发生此错误。



ECS实例处于活动状态,因此AMI映像( ami-05958d7635caa4d04 )看起来不错。但在正在运行已停止状态下没有任务。



要对ECS任务进行故障排除,我尝试了 docker-compose 方法:

 版本:'2'

数量:
jenkins_home:
外部:true

服务:
jenkins:
图片:someaccount / jenkins:ecs
数量:
-jenkins_home:/ var / jenkins_home
-/var/run/docker.sock: /var/run/docker.sock
端口:
- 8080:8080

我看到 docker-compose up -d 在我的笔记本电脑中启动了容器






ECS实例( t2.micro )处于活动状态后,我尝试手动运行新任务,但出现以下错误:





在Cloudformation模板中, EcsTaskDefinition 具有内存:995






当计算平台是Linux服务器(VM)时,我们将分析日志并进行故障排除。



如何解决此问题?在公有云的ECS环境中

解决方案

t2.micro $ b $您尝试启动的b个实例的总内存容量为1 GiB。



ECS会尝试将任务启动到您的集群中,但会意识到它无法满足要求,因为您的容器实例没有足够的可用内存(OS进程消耗的内存超过5 MiB),并且您的任务需要内存:995 ,这正是问题所在



在这种情况下,调试容器本身将无济于事,因为它没有任何问题(可能)。仅仅是因为ECS调度程序无法满足启动任务的指定要求,因此除了失败并给出有用的消息外,它无法执行其他任何操作。



有很多事情可能导致调度程序无法启动任务,并且应该始终提示问题。在这种情况下,它清楚地指出您没有足够的内存。寻找的另一件事是守护程序任务,每个容器实例只能启动一次(这是我偶然发生的)。


Answer did not have pointers for this problem, because rollback deletes the stack.


Below is the CloudFormation template, written to launch Jenkins docker container in ECS container instance(DesiredCount: 1), in default public subnet.

Jenkins docker image is publicly available in dockerhub, which is launched in ECS container instance.

We used ECS optimised AMI image(ami-05958d7635caa4d04) ca-central-1 region, that run docker version 18.06.1.

{
    "AWSTemplateFormatVersion": "2010-09-09",
    "Description": "Jenkins Stack",
    "Parameters":{
        "VpcId": {
            "Type": "AWS::EC2::VPC::Id",
            "Description": "The target VPC Id"
        },
        "SubnetId": {
            "Type": "AWS::EC2::Subnet::Id",
            "Description": "The target subnet Id"
        },
        "KeyName": {
            "Type": "String",
            "Description": "The key pair that is allowed SSH access"
        }
    },
    "Resources":{
        "EC2Instance":{
            "Type": "AWS::EC2::Instance",
            "Properties":{
                "ImageId": "ami-05958d7635caa4d04",
                "InstanceType": "t2.micro",
                "SubnetId": { "Ref": "SubnetId"},
                "KeyName": { "Ref": "KeyName"},
                "SecurityGroupIds": [ { "Ref": "EC2InstanceSecurityGroup"} ],
                "IamInstanceProfile": { "Ref" : "EC2InstanceProfile"},
                "UserData":{
                    "Fn::Base64": { "Fn::Join": ["", [
                        "#!/bin/bash\n",
                        "echo ECS_CLUSTER=", { "Ref": "EcsCluster" }, " >> /etc/ecs/ecs.config\n",
                        "groupadd -g 1000 jenkins\n",
                        "useradd -u 1000 -g jenkins jenkins\n",
                        "mkdir -p /ecs/jenkins_home\n",
                        "chown -R jenkins:jenkins /ecs/jenkins_home\n"
                    ] ] }
                },
                "Tags": [ { "Key": "Name", "Value": { "Fn::Join": ["", [ { "Ref": "AWS::StackName"}, "-instance" ] ]} }]
            }
        },
        "EC2InstanceSecurityGroup":{
            "Type": "AWS::EC2::SecurityGroup",
            "Properties": {
                "GroupDescription": { "Fn::Join": ["", [ { "Ref": "AWS::StackName" }, " ingress security group" ] ] },
                "VpcId": { "Ref": "VpcId" },
                "SecurityGroupIngress": [
                    {
                        "IpProtocol": "tcp",
                        "FromPort": "8080",
                        "ToPort": "8080",
                        "SourceSecurityGroupId": { "Ref": "ElbSecurityGroup"}
                    },
                    {
                        "IpProtocol": "tcp",
                        "FromPort": "22",
                        "ToPort": "22",
                        "CidrIp": "0.0.0.0/0"
                    }
                ]
            }
        },
        "EC2InstanceProfile": {
            "Type": "AWS::IAM::InstanceProfile",
            "Properties": {
                "Path": "/",
                "Roles": [ { "Ref": "EC2InstanceRole" } ]
            }
        },
        "EC2InstanceRole": {
            "Type": "AWS::IAM::Role",
            "Properties": {
                "AssumeRolePolicyDocument":{
                    "Version": "2012-10-17",
                    "Statement": [
                        {
                            "Effect": "Allow",
                            "Principal": { "Service": [ "ec2.amazonaws.com" ] },
                            "Action": [ "sts:AssumeRole" ]
                        }
                    ]
                },
                "Path": "/",
                "ManagedPolicyArns": [ "arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceforEC2Role" ]
            }
        },
        "ElbSecurityGroup": {
            "Type": "AWS::EC2::SecurityGroup",
            "Properties": {
                "GroupDescription": { "Fn::Join": ["", [ { "Ref": "AWS::StackName" }, " ELB ingress security group" ] ] },
                "VpcId": { "Ref": "VpcId"},
                "SecurityGroupIngress": [
                    {
                        "IpProtocol": "tcp",
                        "FromPort": "80",
                        "ToPort": "80",
                        "CidrIp": "0.0.0.0/0"
                    }
                ],
                "Tags": [ { "Key": "Name", "Value": { "Fn::Join": ["", [ { "Ref": "AWS::StackName" }, "-elb-sg" ] ] } } ]
            }
        },
        "ElasticLoadBalancer": {
            "Type": "AWS::ElasticLoadBalancing::LoadBalancer",
            "Properties": {
                "CrossZone": "false",
                "SecurityGroups": [ { "Ref": "ElbSecurityGroup" } ],
                "Listeners": [
                    {
                        "LoadBalancerPort": "80",
                        "InstancePort": "8080",
                        "Protocol": "http"
                    }

                ],
                "Instances": [ { "Ref": "EC2Instance"} ],
                "Subnets": [ { "Ref": "SubnetId"} ]
            }
        },
        "EcsCluster": {
            "Type": "AWS::ECS::Cluster"
        },
        "EcsTaskDefinition": {
            "Type": "AWS::ECS::TaskDefinition",
            "Properties": {
                "ContainerDefinitions": [
                    {
                        "Name": "jenkins",
                        "Image": "someaccount/jenkins:ecs",
                        "Memory": 995,
                        "PortMappings": [ { "ContainerPort": 8080, "HostPort": 8080 } ],
                        "MountPoints": [
                            {
                                "SourceVolume": "docker",
                                "ContainerPath": "/var/run/docker.sock"
                            },
                            {
                                "SourceVolume": "jenkins_home",
                                "ContainerPath": "/var/jenkins_home"
                            }
                        ]
                    }


                ],
                "Volumes": [
                    {
                        "Name": "jenkins_home",
                        "Host": { "SourcePath": "/ecs/jenkins_home" }
                    },
                    {
                        "Name": "docker",
                        "Host": { "SourcePath": "/var/run/docker.sock" }
                    }
                ]
            }
        },
        "EcsService": {
            "Type": "AWS::ECS::Service",
            "Properties": {
                "Cluster": { "Ref": "EcsCluster" },
                "TaskDefinition": { "Ref": "EcsTaskDefinition" },
                "DesiredCount": 1
            }
        }
    },
    "Outputs":{
        "ElbDomainName": {
            "Description": "Public DNS name of Elastic Load Balancer",
            "Value": {
                "Fn::GetAtt": [
                    "ElasticLoadBalancer",
                    "DNSName"
                ]
            }
        },
        "EC2InstanceDomainName": {
            "Description": "Public DNS name of EC2 instance",
            "Value": {
                "Fn::GetAtt": [
                    "EC2Instance",
                    "PublicDnsName"
                ]
            }
        }
    }
}


Below are the events of failure:

This error occurs after hours.

ECS instance goes active, so AMI image(ami-05958d7635caa4d04) looks fine. but there is no task in Running or Stopped state.

To troubleshoot ECS task, I tried docker-compose approach:

version: '2'

volumes:
  jenkins_home:
    external: true

services:
  jenkins:
    image: someaccount/jenkins:ecs
    volumes:
      - jenkins_home:/var/jenkins_home
      - /var/run/docker.sock:/var/run/docker.sock
    ports:
      - "8080:8080"

and I see that docker-compose up -d launches the container in my laptop


After ECS instance(t2.micro) goes active, I tried manually running new task, I get below error:

In Cloudformation tempate, EcsTaskDefinition has "Memory": 995


When the computing platform is Linux server(VM), we analyse logs and troubleshoot.

How to troubleshoot this problem? in ECS environment of public cloud

解决方案

t2.micro instance that you are trying to launch has total memory capacity of 1 GiB.

ECS will try to launch the task into your cluster but it will realize that it cannot meet the requirements because your container instance doesn't have enough available memory (OS processes consume more than 5 MiB of memory) and your task requires "Memory": 995 which is exactly the issue.

Debugging the container itself will not help in this case as there is nothing wrong with it (probably). It is simply the fact that ECS scheduler is unable to meet the specified requirements to launch the task, therefore it cannot do anything else other than failing and giving a "helpful" message.

There are various things that can cause scheduler to fail to launch a task and it should always give a hint about the problem. In you case, it "clearly" states that you don't have enough memory. Other thing to look for are daemon tasks that can be launched only once per container instance (this accidentally happened to me).

这篇关于ECS服务不稳定的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆