Amazon Elastic MapReduce的Numpy和Scipy [英] Numpy and Scipy with Amazon Elastic MapReduce

查看:87
本文介绍了Amazon Elastic MapReduce的Numpy和Scipy的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用mrjob在亚马逊的Elastic MapReduce上运行python代码,我已经成功找到了一种升级EMR图像的numpy和scipy的方法.

Using the mrjob to run python code on Amazon's Elastic MapReduce I have successfully found a way to upgrade the EMR image's numpy and scipy.

从控制台运行,以下命令有效:

Running from console the following commands work:

    tar -cvf py_bundle.tar mymain.py Utils.py numpy-1.6.1.tar.gz scipy-0.9.0.tar.gz

    gzip py_bundle.tar 

    python my_mapper.py -r emr --python-archive py_bundle.tar.gz --bootstrap-python-package numpy-1.6.1.tar.gz --bootstrap-python-package scipy-0.9.0.tar.gz > output.txt 

这成功地将最新的numpy和scipy引导到图像中,并且运行良好.我的问题是速度问题.在小型实例上自行安装需要21分钟.

This successfully bootstraps the latest numpy and scipy into the image and works perfectly. My question is a matter of speed. This takes 21 minutes to install itself on a small instance.

有人知道如何加快升级过程吗? 麻木和肮脏?

Does anyone have any idea how to speed up the process of upgrading numpy and scipy?

推荐答案

对EMR图像执行任何操作的唯一方法是使用引导操作.从控制台执行此操作意味着您将仅更改主节点,而不更改执行处理的任务节点. Bootstrap操作在启动时在所有节点上运行一次,并且可以是一个简单的脚本,可以执行Shell.

The only way to do anything to an EMR image is by using bootstrap actions. Doing this from the console means you'll only change the master node and not the task nodes which do the processing. Bootstrap actions run once at startup on all nodes and can be a simple script that gets shell exec'd.

elastic-mapreduce --create --bootstrap-action "s3://bucket/path/to/script" ...

要加快对EMR映像的更改,请压缩后安装的文件并上传到S3.然后使用引导操作下载和部署.您将必须为32位(微型,小型,中型)和64位计算机保留单独的存档.

To speed up changes to the EMR image, tar up the post-installed files and upload to S3. Then use a bootstrap action to download and deploy. You will have to keep separate archives for 32 bit (micro, small, medium) and 64 bit machines.

脚本中从S3下载的命令是:

The command to download from S3 in the script is:

hadoop fs -get s3://bucket/path/to/archive /tmp/archive

这篇关于Amazon Elastic MapReduce的Numpy和Scipy的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆