spark 2.1.0会话配置设置(pyspark) [英] spark 2.1.0 session config settings (pyspark)
问题描述
我正在尝试覆盖spark会话/火花上下文默认配置,但它正在选择整个节点/群集资源.
I am trying to overwrite the spark session/spark context default configs, but it is picking entire node/cluster resource.
spark = SparkSession.builder
.master("ip")
.enableHiveSupport()
.getOrCreate()
spark.conf.set("spark.executor.memory", '8g')
spark.conf.set('spark.executor.cores', '3')
spark.conf.set('spark.cores.max', '3')
spark.conf.set("spark.driver.memory",'8g')
sc = spark.sparkContext
当我将配置放入Spark Submit中时,效果很好
It works fine when i put the configuration in spark submit
spark-submit --master ip --executor-cores=3 --diver 10G code.py
推荐答案
您实际上并没有使用此代码覆盖任何内容.只是为了让您自己看到,请尝试以下方法.
You aren't actually overwriting anything with this code. Just so you can see for yourself try the following.
一旦您启动pyspark shell,请输入:
As soon as you start pyspark shell type:
sc.getConf().getAll()
这将显示所有当前配置设置.然后尝试您的代码,然后再做一次.没什么改变.
This will show you all of the current config settings. Then try your code and do it again. Nothing changes.
您应该做的是创建一个新配置,并使用该配置创建一个SparkContext.这样做:
What you should do instead is create a new configuration and use that to create a SparkContext. Do it like this:
conf = pyspark.SparkConf().setAll([('spark.executor.memory', '8g'), ('spark.executor.cores', '3'), ('spark.cores.max', '3'), ('spark.driver.memory','8g')])
sc.stop()
sc = pyspark.SparkContext(conf=conf)
然后,您可以像上面一样检查自己:
Then you can check yourself just like above with:
sc.getConf().getAll()
这应该反映您想要的配置.
This should reflect the configuration you wanted.
这篇关于spark 2.1.0会话配置设置(pyspark)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!