如何让 GCE 实例在其部署的容器完成时停止? [英] How to make GCE instance stop when its deployed container finishes?

查看:18
本文介绍了如何让 GCE 实例在其部署的容器完成时停止?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个执行单个大型计算的 Docker 容器.此计算需要大量内存,运行大约需要 12 小时.

I have a Docker container that performs a single large computation. This computation requires lots of memory and takes about 12 hours to run.

我可以创建一个适当大小的 Google Compute Engine VM,并使用将容器映像部署到此 VM 实例"选项来完美运行此作业.但是,一旦作业完成,容器就会退出,但 VM 仍在运行(并正在充电).

I can create a Google Compute Engine VM of the appropriate size and use the "Deploy a container image to this VM instance" option to run this job perfectly. However once the job is finished the container quits but the VM is still running (and charging).

如何在容器退出时让虚拟机退出/停止/删除?

How can I make the VM exit/stop/delete when the container exits?

当虚拟机处于僵尸模式时,只有 stackdriver 容器仍在运行:

When the VM is in its zombie mode only the stackdriver containers are left running:

$ docker ps
CONTAINER ID        IMAGE                                                                COMMAND                  CREATED             STATUS              PORTS               NAMES
bfa2feb03180        gcr.io/stackdriver-agents/stackdriver-logging-agent:0.2-1.5.33-1-1   "/entrypoint.sh /u..."   17 hours ago        Up 17 hours                             stackdriver-logging-agent
161439a487c2        gcr.io/stackdriver-agents/stackdriver-metadata-agent:0.2-0.0.17-2    "/bin/sh -c /opt/s..."   17 hours ago        Up 17 hours         8000/tcp            stackdriver-metadata-agent

我这样创建虚拟机:

gcloud beta compute --project=abc instances create-with-container vm-name 
                    --zone=us-central1-c --machine-type=custom-1-65536-ext 
                    --network=default --network-tier=PREMIUM --metadata=google-logging-enabled=true 
                    --maintenance-policy=MIGRATE 
                    --service-account=xyz 
                    --scopes=https://www.googleapis.com/auth/cloud-platform 
                    --image=cos-stable-69-10895-71-0 --image-project=cos-cloud --boot-disk-size=10GB 
                    --boot-disk-type=pd-standard --boot-disk-device-name=vm-name 
                    --container-image=gcr.io/abc/my-image --container-restart-policy=on-failure 
                    --container-command=python3 
                    --container-arg="a" --container-arg="b" --container-arg="c" 
                    --labels=container-vm=cos-stable-69-10895-71-0

推荐答案

创建 VM 时,您需要授予它对计算的写入权限,以便您可以从内部删除实例.此时您还应该设置容器环境变量,例如 gce_zonegce_project_id.您需要它们来删除实例.

When you create the VM, you'll need to give it write access to compute so you can delete the instance from within. You should also set container environment variables like gce_zone and gce_project_id at this time. You'll need them to delete the instance.

gcloud beta compute instances create-with-container {NAME} 
    --container-env=gce_zone={ZONE},gce_project_id={PROJECT_ID} 
    --service-account={SERVICE_ACCOUNT} 
    --scopes=https://www.googleapis.com/auth/compute,...
    ...

然后在容器内,每当您确定您的任务完成时:

Then within the container, whenever YOU determine your task is finished:

  1. 请求一个 api 令牌(为了简单起见,我使用 curl 和 DEFAULT gce 服务帐户)

curl "http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token" -H "Metadata-Flavor: Google"

这将使用看起来像这样的 json 响应

This will respond with json that looks like

{
  "access_token": "foobarbaz...",
  "expires_in": 1234,
  "token_type": "Bearer"
}

  1. 获取该访问令牌并点击 instances.delete api 端点(注意环境变量)
  1. Take that access token and hit the instances.delete api endpoint (notice the environment variables)

curl -XDELETE -H 'Authorization: Bearer {TOKEN}' https://www.googleapis.com/compute/v1/projects/$gce_project_id/zones/$gce_zone/instances/$HOSTNAME

这篇关于如何让 GCE 实例在其部署的容器完成时停止?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆