Spark 1.4 增加 maxResultSize 内存 [英] Spark 1.4 increase maxResultSize memory
问题描述
我正在使用 Spark 1.4 进行研究并在内存设置方面苦苦挣扎.我的机器有 16GB 的内存,所以没有问题,因为我的文件大小只有 300MB.虽然,当我尝试使用 toPandas()
函数将 Spark RDD 转换为熊猫数据帧时,我收到以下错误:
I am using Spark 1.4 for my research and struggling with the memory settings. My machine has 16GB of memory so no problem there since the size of my file is only 300MB. Although, when I try to convert Spark RDD to panda dataframe using toPandas()
function I receive the following error:
serialized results of 9 tasks (1096.9 MB) is bigger than spark.driver.maxResultSize (1024.0 MB)
我尝试通过更改 spark-config 文件来解决此问题,但仍然遇到相同的错误.我听说这是 spark 1.4 的问题,想知道您是否知道如何解决这个问题.非常感谢任何帮助.
I tried to fix this changing the spark-config file and still getting the same error. I've heard that this is a problem with spark 1.4 and wondering if you know how to solve this. Any help is much appreciated.
推荐答案
您可以在SparkConf
对象中设置spark.driver.maxResultSize
参数:
You can set spark.driver.maxResultSize
parameter in the SparkConf
object:
from pyspark import SparkConf, SparkContext
# In Jupyter you have to stop the current context first
sc.stop()
# Create new config
conf = (SparkConf()
.set("spark.driver.maxResultSize", "2g"))
# Create new context
sc = SparkContext(conf=conf)
您可能还应该创建一个新的 SQLContext
:
You should probably create a new SQLContext
as well:
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)
这篇关于Spark 1.4 增加 maxResultSize 内存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!