jupyter 笔记本中的 PYSPARK_PYTHON 设置被忽略 [英] PYSPARK_PYTHON setup in jupyter notebooks is ignored

查看:68
本文介绍了jupyter 笔记本中的 PYSPARK_PYTHON 设置被忽略的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在尝试从 juputer 笔记本(使用 jupyter 实验室)设置 PYSPARK_PYTHON 以使用特定的 conda env,但我找不到使其工作的方法,我找到了一些使用示例:

I've been trying to setup PYSPARK_PYTHON from a juputer notebook(using jupyter lab) to use a specific conda env but i cannot find a way to make it work, I have found some examples using:

import os

os.environ['PYSPARK_PYTHON'] = "<the path>"

但是没用所以我也试过了:

But it did not work so I also tried:

spark = pyspark.sql.SparkSession.builder \
       .master("yarn-client") \
       .appName(session_name) \
       .config("spark.yarn.appMasterEnv.PYSPARK_PYTHON","<the path>") \
       .enableHiveSupport() \
       .getOrCreate(cluster=cluster)

sc = spark.sparkContext
sqlContext = SQLContext(sc)

但它从不使用指定路径中的指定python版本,问题是,是否有可能忽略配置?在 notebook 中还需要做些什么吗?

But it never uses the specified python version in the specified path , question is, is it possible the config is being ignored? do something else needs to be done in notebook?

我正在使用纱线客户端模式,并且我正在使用 jupyter 实验室的企业/公司实例,因此我无法使用导出在 cli 上设置变量,因为服务器由另一个团队为公司广泛管理,所以我需要在 jupyter 启动期间使用与在 cli 中导出不同的方法来执行此操作.

I'm using yarn-client mode, and i'm using an enterprise/corporate instance of jupyter lab so i cannot set the variables on the cli using export because the server is widely managed for the company by another team so i need to do it with something different than export at cli during jupyter start.

推荐答案

为了让它工作,你还应该将这些参数传递给 cli:

To get it working you should also pass those parameters to cli:

export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS='notebook'

另一种方法是安装 findspark 包:

Another method is to install findspark package:

import findspark
findspark.init()

import pyspark

希望能帮到你:https://www.sicara.ai/blog/2017-05-02-get-started-pyspark-jupyter-notebook-3-minutes

这篇关于jupyter 笔记本中的 PYSPARK_PYTHON 设置被忽略的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆