如何让 GCE 实例在其部署的容器完成时停止? [英] How to make GCE instance stop when its deployed container finishes?
问题描述
我有一个执行单个大型计算的 Docker 容器.此计算需要大量内存,运行大约需要 12 小时.
I have a Docker container that performs a single large computation. This computation requires lots of memory and takes about 12 hours to run.
我可以创建一个适当大小的 Google Compute Engine VM,并使用将容器映像部署到此 VM 实例"选项来完美运行此作业.但是,一旦作业完成,容器就会退出,但 VM 仍在运行(并正在充电).
I can create a Google Compute Engine VM of the appropriate size and use the "Deploy a container image to this VM instance" option to run this job perfectly. However once the job is finished the container quits but the VM is still running (and charging).
如何在容器退出时让虚拟机退出/停止/删除?
How can I make the VM exit/stop/delete when the container exits?
当虚拟机处于僵尸模式时,只有 stackdriver 容器仍在运行:
When the VM is in its zombie mode only the stackdriver containers are left running:
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
bfa2feb03180 gcr.io/stackdriver-agents/stackdriver-logging-agent:0.2-1.5.33-1-1 "/entrypoint.sh /u..." 17 hours ago Up 17 hours stackdriver-logging-agent
161439a487c2 gcr.io/stackdriver-agents/stackdriver-metadata-agent:0.2-0.0.17-2 "/bin/sh -c /opt/s..." 17 hours ago Up 17 hours 8000/tcp stackdriver-metadata-agent
我这样创建虚拟机:
gcloud beta compute --project=abc instances create-with-container vm-name
--zone=us-central1-c --machine-type=custom-1-65536-ext
--network=default --network-tier=PREMIUM --metadata=google-logging-enabled=true
--maintenance-policy=MIGRATE
--service-account=xyz
--scopes=https://www.googleapis.com/auth/cloud-platform
--image=cos-stable-69-10895-71-0 --image-project=cos-cloud --boot-disk-size=10GB
--boot-disk-type=pd-standard --boot-disk-device-name=vm-name
--container-image=gcr.io/abc/my-image --container-restart-policy=on-failure
--container-command=python3
--container-arg="a" --container-arg="b" --container-arg="c"
--labels=container-vm=cos-stable-69-10895-71-0
推荐答案
创建 VM 时,您需要授予它对计算的写入权限,以便您可以从内部删除实例.此时您还应该设置容器环境变量,例如 gce_zone
和 gce_project_id
.您需要它们来删除实例.
When you create the VM, you'll need to give it write access to compute so you can delete the instance from within. You should also set container environment variables like gce_zone
and gce_project_id
at this time. You'll need them to delete the instance.
gcloud beta compute instances create-with-container {NAME}
--container-env=gce_zone={ZONE},gce_project_id={PROJECT_ID}
--service-account={SERVICE_ACCOUNT}
--scopes=https://www.googleapis.com/auth/compute,...
...
然后在容器内,每当您确定您的任务完成时:
Then within the container, whenever YOU determine your task is finished:
- 请求一个 api 令牌(为了简单起见,我使用 curl 和 DEFAULT gce 服务帐户)
curl "http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token" -H "Metadata-Flavor: Google"
这将使用看起来像这样的 json 响应
This will respond with json that looks like
{
"access_token": "foobarbaz...",
"expires_in": 1234,
"token_type": "Bearer"
}
- 获取该访问令牌并点击
instances.delete
api 端点(注意环境变量)
- Take that access token and hit the
instances.delete
api endpoint (notice the environment variables)
curl -XDELETE -H 'Authorization: Bearer {TOKEN}' https://www.googleapis.com/compute/v1/projects/$gce_project_id/zones/$gce_zone/instances/$HOSTNAME
这篇关于如何让 GCE 实例在其部署的容器完成时停止?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!