Spark:java.io.IOException:设备上没有剩余空间 [英] Spark: java.io.IOException: No space left on device

查看:995
本文介绍了Spark:java.io.IOException:设备上没有剩余空间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

现在我正在学习如何使用spark.我有一段代码可以反转矩阵,并且当矩阵的阶数很小(如100)时可以工作,但是当矩阵的阶数较大时(如2000),我有像这样的异常:

Now I am learning how to use spark.I have a piece of code which can invert a matrix and it works when the order of the matrix is small like 100.But when the order of the matrix is big like 2000 I have an exception like this:

15/05/10 20:31:00 ERROR DiskBlockObjectWriter: Uncaught exception while reverting partial writes to file /tmp/spark-local-20150510200122-effa/28/temp_shuffle_6ba230c3-afed-489b-87aa-91c046cadb22

java.io.IOException: No space left on device

在我的程序中,我有很多这样的行:

In my program I have lots of lines like this:

val result1=matrix.map(...).reduce(...)
val result2=result1.map(...).reduce(...)
val result3=matrix.map(...)

(很抱歉,因为要在此处编写代码的人很多)

(sorry about that because the code is to many to write there)

所以我认为我在执行此Spark时会创建一些新的rdds,而在我的程序中Spark会创建太多的rdds,所以我有例外.我不确定我的想法是否正确.

So I think when I do this Spark create some new rdds,and in my program Spark creates too many rdds so I have the exception.I am not sure if what I thought is correct.

如何删除不再使用的rdds?像result1和result2一样?

How can I delete the rdds that I won't use any more?Like result1 and result2?

我尝试过rdd.unpersist(),它不起作用.

I have tried rdd.unpersist(), it doesn't work.

推荐答案

这是因为Spark在您本地系统的/tmp目录下创建了一些临时shuffle文件.您可以通过在spark conf文件中设置以下属性来避免此问题.

This is because Spark create some temp shuffle files under /tmp directory of you local system.You can avoid this issue by setting below properties in your spark conf files.

在spark-env.sh中设置此属性.

Set this property in spark-env.sh.

SPARK_JAVA_OPTS+=" -Dspark.local.dir=/mnt/spark,/mnt2/spark -Dhadoop.tmp.dir=/mnt/ephemeral-hdfs"

export SPARK_JAVA_OPTS

这篇关于Spark:java.io.IOException:设备上没有剩余空间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆