为什么在 python 控制台中调用 SparkSession.builder..getOrCreate() 被视为命令行 spark-submit? [英] Why is a call to SparkSession.builder..getOrCreate() in python console being treated like command line spark-submit?

查看:89
本文介绍了为什么在 python 控制台中调用 SparkSession.builder..getOrCreate() 被视为命令行 spark-submit?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

python 控制台 内部我正在尝试创建一个 Spark Session(我没有使用 pyspark 来隔离依赖项).为什么会生成 spark-submit 命令行提示和错误?

Inside of python console I am trying to create a Spark Session (I am not using pyspark in order to isolate dependencies). Why are the spark-submit command line prompts and errors being generated??

NOTE: SPARK_PREPEND_CLASSES is set, placing locally compiled Spark classes ahead of assembly.
Error: Missing application resource.

NOTE: SPARK_PREPEND_CLASSES is set, placing locally compiled Spark classes ahead of assembly.
Usage: spark-submit [options] <app jar | python file | R file> [app arguments]
Usage: spark-submit --kill [submission ID] --master [spark://...]
..

Options:
  --master MASTER_URL         spark://host:port, mesos://host:port, yarn,
                              k8s://https://host:port, or local (Default: local[*]).
  --deploy-mode DEPLOY_MODE   Whether to launch the driver program locally ("client") or
                              on one of the worker machines inside the cluster ("cluster")
                              (Default: client).
  --class CLASS_NAME          Your application's main class (for Java / Scala apps).
  --name NAME                 A name of your application.
  --jars JARS                 Comma-separated list of jars to include on the driver
   ..
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 7, in getSpark
  File "/shared/spark/python/pyspark/sql/session.py", line 173, in getOrCreate
    sc = SparkContext.getOrCreate(sparkConf)
  File "/shared/spark/python/pyspark/context.py", line 367, in getOrCreate
    SparkContext(conf=conf or SparkConf())
  File "/shared/spark/python/pyspark/context.py", line 133, in __init__
    SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
  File "/shared/spark/python/pyspark/context.py", line 316, in _ensure_initialized
    SparkContext._gateway = gateway or launch_gateway(conf)
  File "/shared/spark/python/pyspark/java_gateway.py", line 46, in launch_gateway
    return _launch_gateway(conf)
  File "/shared/spark/python/pyspark/java_gateway.py", line 108, in _launch_gateway
    raise Exception("Java gateway process exited before sending its port number")
Exception: Java gateway process exited before sending its port number

推荐答案

在尝试了超过 15 种资源 - 并仔细阅读了大约两倍之后 - 唯一有效的就是这个以前未投票的答案 https://stackoverflow.com/a/55326797/1056563:

After trying over fifteen resources - and perusing about twice that many - the only one that works is this previously- non-upvoted answer https://stackoverflow.com/a/55326797/1056563:

export PYSPARK_SUBMIT_ARGS="--master local[2] pyspark-shell"

使用 local[2] 还是 locallocal[*] 并不重要:是什么 需要的是格式,包括关键的pyspark-shell 部分.

It's not important whether to use local[2] or local or local[*]: what is required is the format including the critical pyspark-shell piece.

另一种处理这种情况的方法 - 并且更能抵抗环境变幻莫测 - 在您的 python 代码中使用以下行:

Another way to handle this - and more resistant to environmental vagaries - is having the following line handy in your python code:

os.environ["PYSPARK_SUBMIT_ARGS"] = "pyspark-shell"

这篇关于为什么在 python 控制台中调用 SparkSession.builder..getOrCreate() 被视为命令行 spark-submit?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆