Spark:java.io.IOException:设备上没有剩余空间 [英] Spark: java.io.IOException: No space left on device
问题描述
现在我正在学习如何使用spark.我有一段代码可以反转矩阵,并且当矩阵的阶数很小(如100)时可以工作,但是当矩阵的阶数较大时(如2000),我有像这样的异常:
Now I am learning how to use spark.I have a piece of code which can invert a matrix and it works when the order of the matrix is small like 100.But when the order of the matrix is big like 2000 I have an exception like this:
15/05/10 20:31:00 ERROR DiskBlockObjectWriter: Uncaught exception while reverting partial writes to file /tmp/spark-local-20150510200122-effa/28/temp_shuffle_6ba230c3-afed-489b-87aa-91c046cadb22
java.io.IOException: No space left on device
在我的程序中,我有很多这样的行:
In my program I have lots of lines like this:
val result1=matrix.map(...).reduce(...)
val result2=result1.map(...).reduce(...)
val result3=matrix.map(...)
(很抱歉,因为要在此处编写代码的人很多)
(sorry about that because the code is to many to write there)
所以我认为我在执行此Spark时会创建一些新的rdds,而在我的程序中Spark会创建太多的rdds,所以我有例外.我不确定我的想法是否正确.
So I think when I do this Spark create some new rdds,and in my program Spark creates too many rdds so I have the exception.I am not sure if what I thought is correct.
如何删除不再使用的rdds?像result1和result2一样?
How can I delete the rdds that I won't use any more?Like result1 and result2?
我尝试过rdd.unpersist(),它不起作用.
I have tried rdd.unpersist(), it doesn't work.
推荐答案
这是因为Spark在您本地系统的/tmp目录下创建了一些临时shuffle文件.您可以通过在spark conf文件中设置以下属性来避免此问题.
This is because Spark create some temp shuffle files under /tmp directory of you local system.You can avoid this issue by setting below properties in your spark conf files.
在spark-env.sh中设置此属性.
Set this property in spark-env.sh.
SPARK_JAVA_OPTS+=" -Dspark.local.dir=/mnt/spark,/mnt2/spark -Dhadoop.tmp.dir=/mnt/ephemeral-hdfs"
export SPARK_JAVA_OPTS
这篇关于Spark:java.io.IOException:设备上没有剩余空间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!