使用docker在AWS ray集群上启动简单的python脚本 [英] Launching a simple python script on an AWS ray cluster with docker

查看:116
本文介绍了使用docker在AWS ray集群上启动简单的python脚本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我发现要遵循ray准则来在ray集群上运行docker映像以执行python脚本非常困难.我发现缺少简单的工作示例.

I am finding it incredibly difficult to follow rays guidelines to running a docker image on a ray cluster in order to execute a python script. I am finding a lack of simple working examples.

所以我有最简单的docker文件:

So I have the simplest docker file:

FROM rayproject/ray
WORKDIR /usr/src/app
COPY . .
CMD ["step_1.py"]
ENTRYPOINT ["python3"]

我用它来创建罐头映像并将其推送到docker hub.("myimage"只是一个例子)

I use this to create can image and push this to docker hub. ("myimage" is just an example)

docker build -t myimage .   
docker push myimage

"step_1.py"每秒打印一次hello,持续200秒:

"step_1.py" just prints hello every second for 200 seconds:

import time
for i in range(200):
    time.sleep(1)
    print("hello")

这是我的config.yaml.再次非常简单:

This is my config.yaml. again very simple:

cluster_name: simple-1

min_workers: 0
max_workers: 2

docker:
    image: "myimage"    
    container_name: "my_simple_docker_container"
    pull_before_run: True

idle_timeout_minutes: 5

provider:
    type: aws
    region: eu-west-2
    availability_zone: eu-west-2a

file_mounts_sync_continuously: False



auth:
    ssh_user: ubuntu
    ssh_private_key: /home/user/.ssh/aws_ubuntu_test.pem
head_node:
    InstanceType: c5.2xlarge
    ImageId: ami-xxxxx826a6b31fd2c
    KeyName: aws_ubuntu_test

    BlockDeviceMappings:
      - DeviceName: /dev/sda1
        Ebs:
          VolumeSize: 200

worker_nodes:
   InstanceType: c5.2xlarge
   ImageId: ami-xxxxx826a6b31fd2c
   KeyName: aws_ubuntu_test
   InstanceMarketOptions:
        MarketType: spot

head_setup_commands:
    - pip install boto3==1.4.8

worker_setup_commands:  []

head_start_ray_commands:
    - ray stop
    - ulimit -n 65536; ray start --head --port=6379 --object-manager-port=8076 --autoscaling-config=~/ray_bootstrap_config.yaml

worker_start_ray_commands:
    - ray stop
    - ulimit -n 65536; ray start --address=$RAY_HEAD_IP:6379 --object-manager-port=8076

我在终端机上打了

ray up simple1.yaml:  

,并且每次都会出现此错误:

and this error every time:

shared connection to x.x.xx.119 closed.
"docker cp" requires exactly 2 arguments.
See 'docker cp --help'.

Usage:  docker cp [OPTIONS] CONTAINER:SRC_PATH DEST_PATH|-
        docker cp [OPTIONS] SRC_PATH|- CONTAINER:DEST_PATH

Copy files/folders between a container and the local filesystem
Shared connection to x.x.xx.119 closed.

只需添加docker映像即可在其他任何远程计算机上运行,​​而不必在ray群集上运行.

Just to add the docker image will run on any other remote machine just fine, just not on the the ray cluster.

如果有人可以帮助我,我将永远感激不已,我甚至承诺在奋斗之后会在中等水平上增加一个教程.

If someone could please help me, I would be eternally grateful, and I will even promise to add a tutorial on medium after my struggles.

推荐答案

我认为问题可能出在使用 ENTRYPOINT .Ray ClusterLauncher使用大致如下的命令启动docker:

I think the issue might be around using ENTRYPOINT. The Ray ClusterLauncher starts docker using a command roughly like:

docker run --rm --name <NAME> -d -it --net=host <image_name> bash

当我运行 docker build -t myimage时.,然后运行 docker run --rm -it myimage bash ,Docker出现以下错误:

When I ran docker build -t myimage . and then ran docker run --rm -it myimage bash, Docker errored with:

python3: can't open file 'bash': [Errno 2] No such file or directory

这篇关于使用docker在AWS ray集群上启动简单的python脚本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆