如何从本地运行的Spark Shell连接到Spark EMR [英] How to connect to Spark EMR from the locally running Spark Shell

查看:184
本文介绍了如何从本地运行的Spark Shell连接到Spark EMR的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经创建了一个Spark EMR集群.我想在本地主机或EMR群集上执行作业.

I have created a Spark EMR cluster. I would like to execute jobs either on my localhost or EMR cluster.

假设我在本地计算机上运行spark-shell,如何告诉它连接到Spark EMR集群,要运行的确切配置选项和/或命令是什么.

Assuming I run spark-shell on my local computer how can I tell it to connect to the Spark EMR cluster, what would be the exact configuration options and/or commands to run.

推荐答案

一种方法是将您的Spark作业作为EMR步骤添加到EMR集群中.为此,您需要安装 AWS CLI 在您的本地计算机上 (有关安装指南,请参见此处)和您的jar文件在s3上.

One way of doing this is to add your spark job as an EMR step to your EMR cluster. For this, you need AWS CLI installed on your local computer (see here for installation guide), and your jar file on s3.

一旦有了aws cli,假设要运行的spark类为com.company.my.MySparkJob,并且jar文件位于s3://hadi/my-project-0.1.jar上的s3上,则可以从终端运行以下命令:

Once you have aws cli, assuming your spark class to run is com.company.my.MySparkJob and your jar file is located on s3 at s3://hadi/my-project-0.1.jar, you can run the following command from your terminal:

aws emr add-steps --cluster-id j-************* --steps Type=spark,Name=My_Spark_Job,Args=[-class,com.company.my.MySparkJob,s3://hadi/my-project-0.1.jar],ActionOnFailure=CONTINUE

这篇关于如何从本地运行的Spark Shell连接到Spark EMR的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆