使用 Tensorflow 2.0 设置 AWS EC2 实例——AMI 还是自己构建? [英] Setting up AWS EC2 instance with Tensorflow 2.0 -- AMI versus building it yourself?

查看:33
本文介绍了使用 Tensorflow 2.0 设置 AWS EC2 实例——AMI 还是自己构建?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要使用 Tensorflow 2.0 设置一个 AWS EC2 GPU 实例.我看到的所有文档都表明,当前的 AWS AMI 映像仅支持 Tensorflow 1.14 或 1.15,而不支持 Tensorflow 2.0.因此,我想知道在 AWS 实例上获得 Tensorflow-gpu 2.0 的最佳方法是什么.

I need to setup an AWS EC2 GPU instance with Tensorflow 2.0. All of the docs that I have seen indicate that the current AWS AMI images only support Tensorflow 1.14 or 1.15, but not Tensorflow 2.0. Hence I was wondering what is the best way to get Tensorflow-gpu 2.0 on an AWS instance.

我可以创建一个 EC2 GPU 实例,安装 Nvidia 驱动程序,然后使用 nvidia-dockerTensorflow 2.0. 安装一个 docker 实例.使用 Tensorflow 1.14 安装 AWS AMI 映像,然后升级到 Tensorflow 2.0?目前尚不清楚哪种方法更有意义.

I could create an EC2 GPU instance, install the Nvidia drivers, and then install a docker instance using nvidia-docker and Tensorflow 2.0. Or is it easier to just install an AWS AMI image with Tensorflow 1.14 and then upgrade to Tensorflow 2.0? It is not clear which approach makes more sense.

欢迎提出任何建议.

推荐答案

所以我经历了两条路线.现在我想说的是,使用 Tensorflow 2.0 设置 docker 容器比从 AMI 映像构建更容易.

So I went through both routes. Right now I would say that setting up a docker container with Tensorflow 2.0 is easier than building from the AMI image.

对于 docker 路线,您可以使用 GPU 启动 Ubuntu 18.04 实例.然后,您必须按照以下步骤操作.现在我列出了基本步骤,但没有详细介绍.但希望这足以帮助某人入门.

For the docker route, you can spin up an Ubuntu 18.04 instance with GPUs. Then you have to follow the following steps. Now I lay out the basic steps but did not go into great detail. But hopefully this is enough guidance to help someone get started.

  1. 启动实例并安装docker-ce软件.确保传入连接可以访问网络端口 8888.

  1. Startup the instance and install the docker-ce software. Make sure that network port 8888 is accessible for incoming connections.

为特定 GPU 实例安装 nvidia 驱动程序:https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/install-nvidia-driver.html

Install the nvidia drivers for the particular GPU instance: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/install-nvidia-driver.html

从 Nvidia github 存储库安装 nvidia-docker 软件.这将使 docker 映像能够访问 EC2 实例上的 GPU 驱动程序.

Install the nvidia-docker software from the Nvidia github repository. This will enable the docker image to access the GPU drivers on the EC2 instance.

使用以下命令下载并运行 tensorflow 2.0 容器:docker run -it --gpus all --rm -v $(realpath ~/Downloads):/tf/notebooks -p 8888:8888 tensorflow/tensorflow:2.0.0-gpu-py3-jupyter

Download and run the tensorflow 2.0 container with the command: docker run -it --gpus all --rm -v $(realpath ~/Downloads):/tf/notebooks -p 8888:8888 tensorflow/tensorflow:2.0.0-gpu-py3-jupyter

这应该会启动一个笔记本,用户可以从他们的计算机访问它.

This should initiate a notebook that the user can access it from their computer.

如果您想通过 AMI 映像执行此操作,则基本上必须安装 Tensorflow 1.14 映像,然后对其进行升级.这实际上比看起来更难.同样,这是步骤的高级概述,但我尽量包含链接或代码.

If you want to do this through an AMI image, you basically have to install the Tensorflow 1.14 image and then upgrade it. This is actually harder than it looks. Again this is a high level outline of the steps, but I tried to include links or code as best I could.

  1. 在服务器 (25.2) 上设置 ubuntu 18.04 Deep Learning AMI.

  1. Setup ubuntu 18.04 Deep Learning AMI on the server (25.2).

更新和升级 ubuntu:

Update and upgrade ubuntu:

    sudo apt-get update
    sudo apt-get upgrade

  1. 更新 Anaconda 发行版,因为当前发行版使用包管理器的代理版本.

conda update conda
conda update --all

  1. 创建tensorflow 2.0 conda 环境
  1. Create a tensorflow 2.0 conda environment

conda create -n tf2 python=3.7 tensorflow-gpu==2.0 cudatoolkit cudnn jupyter

  1. 在 shell 中初始化 conda.您必须这样做才能使用 shell 中的 conda 命令.您可能需要退出该实例,然后通过 ssh 重新进入该实例.
  1. Initialize conda in the shell. You have to do this to use conda commands from the shell. You might need to exit out of the instance and then ssh back into it.

conda init bash
bash

  1. 安装environment_kernels

pip install environment_kernels

  1. 安装 jupyter 笔记本扩展

conda install -c conda-forge jupyter_contrib_nbextensions

  1. 在实例上安装 Jupyter 服务器.按照链接上的说明操作:https://docs.aws.amazon.com/dlami/latest/devguide/setup-jupyter-config.html

ssh 进入实例并启动 Jupyter 服务器.

ssh into the instance and start the Jupyter server.

ssh -N -f -L 8888:localhost:8888 ubuntu@aws-public-url

  1. 在您的计算机上打开浏览器并浏览到该服务器的公共 URL:8888.

因此我会说使用第一种方法而不是第二种方法,直到亚马逊发布 Tensorflow 2.0 AMI.

Hence I would say use the first approach rather than the second approach, until Amazon releases a Tensorflow 2.0 AMI.

这篇关于使用 Tensorflow 2.0 设置 AWS EC2 实例——AMI 还是自己构建?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆