如何使用"-files"指定多个文件?在Amazon CLI中用于EMR? [英] How can multiple files be specified with "-files" in the CLI of Amazon for EMR?
问题描述
我正在尝试通过amazon CLI启动amazon集群,但是我有点困惑应该如何指定多个文件.我目前的电话如下:
I am trying to start an amazon cluster via the amazon CLI, but I am a little bit confused how I should specify multiple files. My current call is as follows:
aws emr create-cluster --steps Type=STREAMING,Name='Intra country development',ActionOnFailure=CONTINUE,Args=[-files,s3://betaestimationtest/mapper.py,-
files,s3://betaestimationtest/reducer.py,-mapper,mapper.py,-reducer,reducer.py,-
input,s3://betaestimationtest/output_0_inter,-output,s3://betaestimationtest/output_1_intra]
--ami-version 3.1.0
--instance-groupsInstanceGroupType=MASTER,InstanceCount=1,InstanceType=m3.xlarge
InstanceGroupType=CORE,InstanceCount=2,InstanceType=m3.xlarge --auto-terminate
--log-uri s3://betaestimationtest/logs
但是,Hadoop现在抱怨无法找到化简器文件:
However, Hadoop now complains that it cannot find the reducer file:
Caused by: java.io.IOException: Cannot run program "reducer.py": error=2, No such file or directory
我做错了什么?该文件确实存在于我指定的文件夹中
What am I doing wrong? The file does exist in the folder I specify
推荐答案
要在流传输步骤中传递多个文件,您需要使用file://作为json文件传递步骤.
For passing multiple files in a streaming step, you need to use file:// to pass the steps as a json file.
AWS CLI速记语法使用逗号作为分隔符来分隔args列表.因此,当我们尝试传递诸如"-files","s3://betaestimationtest/mapper.py、s3://betaestimationtest/reducer.py"之类的参数时,速记语法分析器将处理mapper.py和reducer. py文件作为两个参数.
AWS CLI shorthand syntax uses comma as delimeter to separate a list of args. So when we try to pass in parameters like: "-files","s3://betaestimationtest/mapper.py,s3://betaestimationtest/reducer.py", then the shorthand syntax parser will treat mapper.py and reducer.py files as two parameters.
解决方法是使用json格式.请参见下面的示例.
The workaround is to use the json format. Please see the examples below.
aws emr create-cluster --steps file://./mysteps.json --ami-version 3.1.0 --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m3.xlarge InstanceGroupType=CORE,InstanceCount=2,InstanceType=m3.xlarge --auto-terminate --log-uri s3://betaestimationtest/logs
mysteps.json如下:
mysteps.json looks like:
[
{
"Name": "Intra country development",
"Type": "STREAMING",
"ActionOnFailure": "CONTINUE",
"Args": [
"-files",
"s3://betaestimationtest/mapper.py,s3://betaestimationtest/reducer.py",
"-mapper",
"mapper.py",
"-reducer",
"reducer.py",
"-input",
" s3://betaestimationtest/output_0_inte",
"-output",
" s3://betaestimationtest/output_1_intra"
]}
]
You can also find examples here: https://github.com/aws/aws-cli/blob/develop/awscli/examples/emr/create-cluster-examples.rst. See example 13.
希望有帮助!
这篇关于如何使用"-files"指定多个文件?在Amazon CLI中用于EMR?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!