当已部署的容器完成时,如何使GCE实例停止? [英] How to make GCE instance stop when its deployed container finishes?
问题描述
我有一个执行单个大型计算的Docker容器。此计算需要大量内存,并且需要大约12个小时才能运行。
I have a Docker container that performs a single large computation. This computation requires lots of memory and takes about 12 hours to run.
我可以创建适当大小的Google Compute Engine VM,然后使用将容器映像部署到此VM实例选项以完美运行此作业。但是,一旦作业完成,容器将退出,但VM仍在运行(并且正在充电)。
I can create a Google Compute Engine VM of the appropriate size and use the "Deploy a container image to this VM instance" option to run this job perfectly. However once the job is finished the container quits but the VM is still running (and charging).
当容器退出时,如何使VM退出/停止/删除?
How can I make the VM exit/stop/delete when the container exits?
当VM处于其僵尸模式时,仅堆栈驱动器容器处于运行状态:
When the VM is in its zombie mode only the stackdriver containers are left running:
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
bfa2feb03180 gcr.io/stackdriver-agents/stackdriver-logging-agent:0.2-1.5.33-1-1 "/entrypoint.sh /u..." 17 hours ago Up 17 hours stackdriver-logging-agent
161439a487c2 gcr.io/stackdriver-agents/stackdriver-metadata-agent:0.2-0.0.17-2 "/bin/sh -c /opt/s..." 17 hours ago Up 17 hours 8000/tcp stackdriver-metadata-agent
我这样创建VM:
gcloud beta compute --project=abc instances create-with-container vm-name \
--zone=us-central1-c --machine-type=custom-1-65536-ext \
--network=default --network-tier=PREMIUM --metadata=google-logging-enabled=true \
--maintenance-policy=MIGRATE \
--service-account=xyz \
--scopes=https://www.googleapis.com/auth/cloud-platform \
--image=cos-stable-69-10895-71-0 --image-project=cos-cloud --boot-disk-size=10GB \
--boot-disk-type=pd-standard --boot-disk-device-name=vm-name \
--container-image=gcr.io/abc/my-image --container-restart-policy=on-failure \
--container-command=python3 \
--container-arg="a" --container-arg="b" --container-arg="c" \
--labels=container-vm=cos-stable-69-10895-71-0
推荐答案
创建VM时,需要向其授予写计算访问权限,以便您可以从内部删除实例。您还应该在此时设置容器环境变量,例如 gce_zone
和 gce_project_id
。您需要他们删除实例。
When you create the VM, you'll need to give it write access to compute so you can delete the instance from within. You should also set container environment variables like gce_zone
and gce_project_id
at this time. You'll need them to delete the instance.
gcloud beta compute instances create-with-container {NAME} \
--container-env=gce_zone={ZONE},gce_project_id={PROJECT_ID} \
--service-account={SERVICE_ACCOUNT} \
--scopes=https://www.googleapis.com/auth/compute,...
...
然后在容器,每当您确定任务完成时:
Then within the container, whenever YOU determine your task is finished:
- 请求api令牌(即使用curl简化操作并默认使用gce服务帐户)
curl "http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token" -H "Metadata-Flavor: Google"
这将以看起来像
{
"access_token": "foobarbaz...",
"expires_in": 1234,
"token_type": "Bearer"
}
- 使用该ac cess令牌并点击
instances.delete
api端点(注意环境变量)
- Take that access token and hit the
instances.delete
api endpoint (notice the environment variables)
curl -XDELETE -H 'Authorization: Bearer {TOKEN}' https://www.googleapis.com/compute/v1/projects/$gce_project_id/zones/$gce_zone/instances/$HOSTNAME
这篇关于当已部署的容器完成时,如何使GCE实例停止?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!