Jupyter + EMR + Spark-从本地计算机上的Jupyter笔记本连接到EMR群集 [英] Jupyter + EMR + Spark - Connect to EMR cluster from Jupyter notebook on local machine

查看:95
本文介绍了Jupyter + EMR + Spark-从本地计算机上的Jupyter笔记本连接到EMR群集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是PySpark和EMR的新手.
我正在尝试通过Jupyter笔记本访问在EMR群集上运行的Spark,但遇到错误.

I am new to PySpark and EMR.
I am trying to access Spark running on EMR cluster through Jupyter notebook, but running into errors.

我正在使用以下代码生成SparkSession:

I am generating SparkSession using following code:

spark = SparkSession.builder \
    .master("local[*]")\
    .appName("Carbon - SingleWell parallelization on Spark")\
    .getOrCreate()

尝试以下操作以访问远程群集,但出错:

Tried following to access Remote cluster, but it errored out:

spark = SparkSession.builder \
    .master("spark://<remote-emr-ec2-hostname>:7077")\
    .appName("Carbon - SingleWell parallelization on Spark")\
    .getOrCreate()

错误:

Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.NullPointerException
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:567)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

任何帮助解决此问题的方法将不胜感激.

Any help resolving this would be much appreciated.

推荐答案

EMR群集为您提供了Jupyter和JupyterHub

EMR clusters have Jupyter and JupyterHub provisioned for you since EMR version 5.14.0.

很可能

Most likely, it is easier to tune those provisioned services up with some extra bootstrap actions than to wire up your local process to talk to the EMR master node.

这篇关于Jupyter + EMR + Spark-从本地计算机上的Jupyter笔记本连接到EMR群集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆