尝试在AWS Emr上创建临时集群以运行Python脚本时遇到错误 [英] Facing error while trying to create transient cluster on AWS emr to run Python script

查看:239
本文介绍了尝试在AWS Emr上创建临时集群以运行Python脚本时遇到错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是aws的新手,正在尝试在AWS emr上创建一个瞬态集群以运行Python脚本.我只想运行将处理文件并自动在完成后终止集群的python脚本.我还创建了一个密钥对,并指定了相同的密钥对.

I am new to aws and trying to create a transient cluster on AWS emr to run a Python script. I just want to run the python script that will process the file and auto terminate the cluster post completion. I have also created a keypair and specified the same.

下面的命令:

aws emr create-cluster --name "test1-cluster" --release-label emr-5.5.0 --name pyspark_analysis --ec2-attributes KeyName=k-key-pair --applications Name=Hadoop Name=Hive Name=Spark --instance-groups --use-default-roles --instance-type m5-xlarge --instance-count 2 --region us-east-1 --log-uri s3://k-test-bucket-input/logs/ --steps Type=SPARK, Name="pyspark_analysis", ActionOnFailure=CONTINUE, Args=[-deploy-mode,cluster, -master,yarn, -conf,spark.yarn.submit.waitAppCompletion=true, -executor-memory,1g, s3://k-test-bucket-input/word_count.py, s3://k-test-bucket-input/input/a.csv, s3://k-test-bucket-input/output/ ] --auto-terminate

错误消息

zsh: bad pattern: Args=[

我尝试过的事情:

我查看了args和空格,以及是否引入了偶然字符,但看起来不是这样.当然,我的语法错误,但是不确定我缺少什么.

I looked at the args and the spaces and if accidental characters are introduced or not but does not look like. Surely my syntax is wrong but not sure what I am missing.

预期执行的命令:

希望通过读取输入文件a.csv并在b.csv中生成输出来执行word_count.py

its expected to execute word_count.py by reading the input file a.csv and generating the output in b.csv

推荐答案

我认为问题出在--steps中使用空格.我对命令进行了格式化,因此更容易阅读空格(或运气)在哪里:

I think the issue is with the use of spaces in --steps. I formatted the command, so its a bit easier to read where are the spaces (or luck of them):

aws emr create-cluster \
    --name "test1-cluster" \
    --release-label emr-5.5.0 \
    --name pyspark_analysis \
    --ec2-attributes KeyName=k-key-pair \
    --applications Name=Hadoop Name=Hive Name=Spark \
    --instance-groups --use-default-roles \
    --instance-type m5-xlarge --instance-count 2 \
    --region us-east-1 --log-uri s3://k-test-bucket-input/logs/ \
    --steps Type=SPARK,Name="pyspark_analysis",ActionOnFailure=CONTINUE,Args=[-deploy-mode,cluster,-master,yarn,-conf,spark.yarn.submit.waitAppCompletion=true,-executor-memory,1g,s3://k-test-bucket-input/word_count.py,s3://k-test-bucket-input/input/a.csv,s3://k-test-bucket-input/output/] \
    --auto-terminate

这篇关于尝试在AWS Emr上创建临时集群以运行Python脚本时遇到错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆