初始化PySpark以预定义SparkContext变量'sc' [英] Initialize PySpark to predefine the SparkContext variable 'sc'

查看:505
本文介绍了初始化PySpark以预定义SparkContext变量'sc'的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用PySpark时,我希望在创建新笔记本时初始化(在纱线客户端模式下)SparkContext。

When using PySpark I'd like a SparkContext to be initialised (in yarn client mode) upon creation of a new notebook.

以下教程描述了如何操作这在过去版本的ipython / jupyter< 4

The following tutorials describe how to do this in past versions of ipython/jupyter < 4

https:/ /www.dataquest.io/blog/pyspark-installation-guide/

https://npatta01.github.io/2015/07/22/setting_up_pyspark/

我不太确定如何使用 4 -have-profiles-how-do-i-customize-itrel =nofollow noreferrer> http://jupyter.readthedocs.io/en/latest/migrating.html#since-jupyter-does-not-have- profile-how-do-do-custom-it

我可以手动创建和配置Sparkcontext,但我不希望分析师担心关于这个。

I can manually create and configure a Sparkcontext but I don't want our analysts to have to worry about this.

有没有人有任何想法?

推荐答案

那么,Jupyter中缺少的配置文件功能也让人感到困惑过去曾经带领我,虽然原因不同 - 我希望能够在不同的深度学习框架之间切换(Theano& amp; TensorFlow)按需;最终我找到了解决方案(在我的博客文章中描述此处)。

Well, the missing profiles functionality in Jupyter also puzzled me in the past, albeit for a different reason - I wanted to be able to switch between different deep learning frameworks (Theano & TensorFlow) on demand; eventually I found the solution (described in a blog post of mine here).

事实是,尽管Jupyter中没有配置文件,但 IPython 内核的.html#startup-filesrel =nofollow noreferrer>启动文件功能仍然存在,而且,由于Pyspark使用了这个特定的内核,因此它可以用于你的情况。

The fact is that, although there are not profiles in Jupyter, the startup files functionality for the IPython kernel is still there, and, since Pyspark employs this particular kernel, it can be used in your case.

所以,如果你已经为Jupyter准备了一个Pyspark内核,那么你所要做的就是编写一个简短的初始化脚本 init_spark .py 如下:

So, provided that you already have a working Pyspark kernel for Jupyter, all you have to do is write a short initialization script init_spark.py as follows:

from pyspark import SparkConf, SparkContext
conf = SparkConf().setMaster("yarn-client")
sc = SparkContext(conf = conf)

并将其放在〜/ .ipython / profile_defau lt / startup / 用户目录。

您现在可以确认 sc 已经在启动Jupyter笔记本后设置:

You can confirm that now sc is already set after starting a Jupyter notebook:

 In [1]: sc
 Out[1]:<pyspark.context.SparkContext at 0x7fcceb7c5fd0>

 In [2]: sc.version
 Out[2]: u'2.0.0'

整合PySpark& amp;我的答案中描述了Jupyter笔记本此处此处

A more disciplined way for integrating PySpark & Jupyter notebooks is described in my answers here and here.

第三种方法是尝试Apache Toree(以前称为Spark Kernel),如此处所述(虽然没有测试过。

A third way is to try Apache Toree (formerly Spark Kernel), as described here (haven't tested it though).

这篇关于初始化PySpark以预定义SparkContext变量'sc'的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆