如何从本地运行的Spark Shell连接到Spark EMR [英] How to connect to Spark EMR from the locally running Spark Shell
问题描述
我已经创建了一个Spark EMR集群.我想在本地主机或EMR群集上执行作业.
I have created a Spark EMR cluster. I would like to execute jobs either on my localhost or EMR cluster.
假设我在本地计算机上运行spark-shell,如何告诉它连接到Spark EMR集群,要运行的确切配置选项和/或命令是什么.
Assuming I run spark-shell on my local computer how can I tell it to connect to the Spark EMR cluster, what would be the exact configuration options and/or commands to run.
推荐答案
一种方法是将您的Spark作业作为EMR步骤添加到EMR集群中.为此,您需要安装 AWS CLI 在您的本地计算机上 (有关安装指南,请参见此处)和您的jar文件在s3上.
One way of doing this is to add your spark job as an EMR step to your EMR cluster. For this, you need AWS CLI installed on your local computer (see here for installation guide), and your jar file on s3.
一旦有了aws cli,假设要运行的spark类为com.company.my.MySparkJob
,并且jar文件位于s3://hadi/my-project-0.1.jar
上的s3上,则可以从终端运行以下命令:
Once you have aws cli, assuming your spark class to run is com.company.my.MySparkJob
and your jar file is located on s3 at s3://hadi/my-project-0.1.jar
, you can run the following command from your terminal:
aws emr add-steps --cluster-id j-************* --steps Type=spark,Name=My_Spark_Job,Args=[-class,com.company.my.MySparkJob,s3://hadi/my-project-0.1.jar],ActionOnFailure=CONTINUE
这篇关于如何从本地运行的Spark Shell连接到Spark EMR的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!