如何在Pyspark中启动SparkSession [英] How to start sparksession in pyspark

查看:752
本文介绍了如何在Pyspark中启动SparkSession的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想更改Spark会话的默认内存,执行程序和核心设置. Jupyter上HDInsight群集上我pyspark笔记本中的第一个代码如下:

I want to change the default memory, executor and core settings of a spark session. The first code in my pyspark notebook on HDInsight cluster in Jupyter looks like this:

from pyspark.sql import SparkSession

spark = SparkSession\
    .builder\
    .appName("Juanita_Smith")\
    .config("spark.executor.instances", "2")\
    .config("spark.executor.cores", "2")\
    .config("spark.executor.memory", "2g")\
    .config("spark.driver.memory", "2g")\
    .getOrCreate()

完成后,我读回了参数,看起来该语句起作用了

On completion, I read the parameters back, which looks like the statement worked

但是,如果我查看纱线,设置确实不起作用.

However if I look in yarn, the setting have indeed not worked.

我需要进行哪些设置或命令才能使会话配置生效?

Which settings or commands do I need to make to let the session configuration take effect ?

谢谢您的帮助

推荐答案

在笔记本内核启动时,已经使用内核配置文件中定义的参数创建了SparkSession.要更改此设置,您将需要更新或替换内核配置文件,我相信该文件通常位于<jupyter home>/kernels/<kernel name>/kernel.json之类的地方.

By the time your notebook kernel has started, the SparkSession is already created with parameters defined in a kernel configuration file. To change this, you will need to update or replace the kernel configuration file, which I believe is usually somewhere like <jupyter home>/kernels/<kernel name>/kernel.json.

如果您可以访问托管Jupyter服务器的计算机,则可以使用jupyter kernelspec list查找当前内核配置的位置.然后,您可以编辑pyspark内核配置之一,或将其复制到新文件并进行编辑.为了您的目的,您需要将以下参数添加到PYSPARK_SUBMIT_ARGS:

If you have access to the machine hosting your Jupyter server, you can find the location of the current kernel configurations using jupyter kernelspec list. You can then either edit one of the pyspark kernel configurations, or copy it to a new file and edit that. For your purposes, you will need to add the following arguments to the PYSPARK_SUBMIT_ARGS:

"PYSPARK_SUBMIT_ARGS": "--conf spark.executor.instances=2 --conf spark.executor.cores=2 --conf spark.executor.memory=2g --conf spark.driver.memory=2g"

这篇关于如何在Pyspark中启动SparkSession的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆