如何在 pyspark 中启动 sparksession [英] How to start sparksession in pyspark
问题描述
我想更改 spark 会话的默认内存、执行程序和核心设置.我在 Jupyter 中 HDInsight 集群上的 pyspark 笔记本中的第一个代码如下所示:
from pyspark.sql import SparkSession火花 = SparkSession\.builder\.appName("Juanita_Smith")\.config("spark.executor.instances", "2")\.config("spark.executor.cores", "2")\.config("spark.executor.memory", "2g")\.config("spark.driver.memory", "2g")\.getOrCreate()
完成后,我读回参数,看起来该语句有效
但是,如果我查看纱线,该设置确实不起作用.
我需要进行哪些设置或命令才能让会话配置生效?
提前感谢您的帮助
当您的笔记本内核启动时,SparkSession
已经使用内核配置文件中定义的参数创建.要更改此设置,您需要更新或替换内核配置文件,我认为该文件通常位于 <jupyter home>/kernels/<kernel name>/kernel.json
之类的地方.>
更新
如果您有权访问托管 Jupyter 服务器的机器,您可以使用 jupyter kernelspec list
找到当前内核配置的位置.然后,您可以编辑 pyspark 内核配置之一,或将其复制到新文件并进行编辑.出于您的目的,您需要将以下参数添加到 PYSPARK_SUBMIT_ARGS
:
PYSPARK_SUBMIT_ARGS":--conf spark.executor.instances=2 --conf spark.executor.cores=2 --conf spark.executor.memory=2g --conf spark.driver.内存=2g"
I want to change the default memory, executor and core settings of a spark session. The first code in my pyspark notebook on HDInsight cluster in Jupyter looks like this:
from pyspark.sql import SparkSession
spark = SparkSession\
.builder\
.appName("Juanita_Smith")\
.config("spark.executor.instances", "2")\
.config("spark.executor.cores", "2")\
.config("spark.executor.memory", "2g")\
.config("spark.driver.memory", "2g")\
.getOrCreate()
On completion, I read the parameters back, which looks like the statement worked
However if I look in yarn, the setting have indeed not worked.
Which settings or commands do I need to make to let the session configuration take effect ?
Thank you for help in advance
By the time your notebook kernel has started, the SparkSession
is already created with parameters defined in a kernel configuration file. To change this, you will need to update or replace the kernel configuration file, which I believe is usually somewhere like <jupyter home>/kernels/<kernel name>/kernel.json
.
Update
If you have access to the machine hosting your Jupyter server, you can find the location of the current kernel configurations using jupyter kernelspec list
. You can then either edit one of the pyspark kernel configurations, or copy it to a new file and edit that. For your purposes, you will need to add the following arguments to the PYSPARK_SUBMIT_ARGS
:
"PYSPARK_SUBMIT_ARGS": "--conf spark.executor.instances=2 --conf spark.executor.cores=2 --conf spark.executor.memory=2g --conf spark.driver.memory=2g"
这篇关于如何在 pyspark 中启动 sparksession的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!