Spark 1.4 增加 maxResultSize 内存 [英] Spark 1.4 increase maxResultSize memory

查看:16
本文介绍了Spark 1.4 增加 maxResultSize 内存的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 Spark 1.4 进行研究并在内存设置方面苦苦挣扎.我的机器有 16GB 的内存,所以没有问题,因为我的文件大小只有 300MB.虽然,当我尝试使用 toPandas() 函数将 Spark RDD 转换为熊猫数据帧时,我收到以下错误:

I am using Spark 1.4 for my research and struggling with the memory settings. My machine has 16GB of memory so no problem there since the size of my file is only 300MB. Although, when I try to convert Spark RDD to panda dataframe using toPandas() function I receive the following error:

serialized results of 9 tasks (1096.9 MB) is bigger than spark.driver.maxResultSize (1024.0 MB)

我尝试通过更改 spark-config 文件来解决此问题,但仍然遇到相同的错误.我听说这是 spark 1.4 的问题,想知道您是否知道如何解决这个问题.非常感谢任何帮助.

I tried to fix this changing the spark-config file and still getting the same error. I've heard that this is a problem with spark 1.4 and wondering if you know how to solve this. Any help is much appreciated.

推荐答案

您可以在SparkConf对象中设置spark.driver.maxResultSize参数:

You can set spark.driver.maxResultSize parameter in the SparkConf object:

from pyspark import SparkConf, SparkContext

# In Jupyter you have to stop the current context first
sc.stop()

# Create new config
conf = (SparkConf()
    .set("spark.driver.maxResultSize", "2g"))

# Create new context
sc = SparkContext(conf=conf)

您可能还应该创建一个新的 SQLContext:

You should probably create a new SQLContext as well:

from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)

这篇关于Spark 1.4 增加 maxResultSize 内存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆