星火1.4增加maxResultSize内存 [英] Spark 1.4 increase maxResultSize memory

查看:501
本文介绍了星火1.4增加maxResultSize内存的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用的Spark 1.4我的研究,并与内存设置挣扎。我的机器有16GB内存所以没有问题存在,因为我的文件大小只有300MB。虽然当我尝试使用toPandas的功能,我收到以下错误星火RDD转换为数据框熊猫:

I am using Spark 1.4 for my research and struggling with the memory settings. My machine has 16GB of memory so no problem there since the size of my file is only 300MB. Although when I try to convert Spark RDD to panda dataframe using toPandas function I receive the following error:

serialized results of 9 tasks (1096.9 MB) is bigger than spark.driver.maxResultSize (1024.0 MB)

我试图解决这个改变火花配置文件,仍然得到同样的错误。我听说,这是火花1.4和想知道的一个问题,如果你知道如何解决这个问题。任何帮助深表AP preciated。

I tried to fix this changing the spark-config file and still getting the same error. I've heard that this is a problem with spark 1.4 and wondering if you know how to solve this. Any help is much appreciated.

推荐答案

您可以设置在 SparkConf spark.driver.maxResultSize 参数code>目标:

You can set spark.driver.maxResultSize parameter in the SparkConf object:

from pyspark import SparkConf, SparkContext

# In Jupyter you have to stop the current context first
sc.stop()

# Create new config
conf = (SparkConf()
    .set("spark.driver.maxResultSize", "2g"))

# Create new context
sc = SparkContext(conf=conf)

您或许应该创建一个新的 SQLContext 以及

You should probably create a new SQLContext as well:

from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)

这篇关于星火1.4增加maxResultSize内存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆