如何在代码中的Amazon EMR引导操作上安装自定义包? [英] how to install custom packages on amazon EMR bootstrap action in code?

查看:111
本文介绍了如何在代码中的Amazon EMR引导操作上安装自定义包?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

需要在Amazon EMR引导操作上安装一些软件包和二进制文件,但我找不到使用此软件包的任何示例.

need to install some packages and binaries on the amazon EMR bootstrap action but I can't find any example that uses this.

基本上,我想安装python软件包,并指定每个hadoop节点以使用该软件包来处理s3存储桶中的项目,这是示例frpm boto.

Basically, I want to install python package, and specify each hadoop node to use this package for processing the items in s3 bucket, here's a sample frpm boto.

                      name='Image to grayscale using SimpleCV python package',
                      mapper='s3n://elasticmapreduce/samples/imageGrayScale.py',
                      reducer='aggregate',
                      input='s3n://elasticmapreduce/samples/input',
                      output='s3n://<my output bucket>/output'

我需要使它使用SimpleCV python包,但不确定在哪里指定它.如果未安装该怎么办,如何安装?有没有一种方法可以避免等待安装完成,是否可以将其安装在某个地方并仅引用python软件包?

I need to make it use the SimpleCV python package, but not sure where to specify this. What if it is not installed, how do I make it installed? Is there a way to avoid waiting for the installation to complete, is it possible to install it somewhere and just reference the python package?

推荐答案

存在用于引导操作的类 boto.emr.bootstrap_action.BootstrapAction .

There is a class boto.emr.bootstrap_action.BootstrapAction for the bootstrap action.

按如下所示对其进行定义.大部分代码来自 boto示例页面.

Define it like the below. Most of the code is from the boto example page.

import boto.emr
from boto.emr.bootstrap_action import BootstrapAction

action = BootstrapAction(name="Bootstrap to add SimpleCV",
                         path="s3n://<my bucket uri>/bootstrap-simplecv.sh")

conn = boto.emr.connect_to_region('us-west-2')
jobid = conn.run_jobflow(name='My jobflow',
                         log_uri='s3://<my log uri>/jobflow_logs',
                         steps=[step],  # step defined elsewhere
                         bootstrap_actions=[action])

您需要定义引导操作.如果您需要另一个版本的Python,则可以,这样可以节省时间在完全相同的计算机上进行预编译,将其压缩,放入S3存储桶中,然后在引导过程中将其解压缩.

And you need to define the bootstrap action. If you need another version of Python then yes, it would save time to precompile it on the exact same computer, tar it, put it in an S3 bucket, and then untar it during the bootstrap.

#!/bin/sh
# filename: bootstrap-simplecv.sh  (save it in an S3 bucket)
set -e -x

sudo apt-get install python-setuptools
sudo easy_install pip 
sudo pip install -U SimpleCV

我认为您可以让EMR实例从boto内部旋转,以便引导仅在您的会话中第一次出现.登出前请小心关闭它们,以免账单上出现意外.

I think you can leave EMR instances spinning from within boto so that the bootstrap only occurs the first time in your session. Just be careful to shut them down before you log out so you don't get a surprise on your bill.

这篇关于如何在代码中的Amazon EMR引导操作上安装自定义包?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆