为什么一个作业失败，[否留在设备和QUOT空间;但是DF说，否则？ [英] Why does a job fail with "No space left on device", but df says otherwise?

查看：159 发布时间：2016/5/22 15:55:17 apache-spark

本文介绍了为什么一个作业失败，[否留在设备和QUOT空间;但是DF说，否则？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

执行当我洗牌星火作业失败，并说：留在设备上没有空间，但是当我运行 DF -h 它说我有剩余空间！为什么会这样，我怎么能解决这个问题？

When performing a shuffle my Spark job fails and says "no space left on device", but when I run df -h it says I have free space left! Why does this happen, and how can I fix it?

推荐答案

您还需要监控 DF -i 这显示多少的inode都在使用。

You need to also monitor df -i which shows how many inodes are in use.

在每台机器上，我们创建M * R代表洗牌，临时文件，其中M = map任务数，R = reduce任务的数量。

on each machine, we create M * R temporary files for shuffle, where M = number of map tasks, R = number of reduce tasks.

https://spark-project.atlassian.net/browse/SPARK-751

如果你确实看到磁盘被耗尽索引节点来解决这个问题，您可以：

If you do indeed see that disks are running out of inodes to fix the problem you can:

减小分区（见合并与 =洗牌虚假）。

人们可以通过合并文件拖放到O（R）的数量。由于不同的文件系统不同的表现我们建议您在 spark.shuffle.consolidateFiles 阅读，看到<一个href=\"https://spark-project.atlassian.net/secure/attachment/10600/Consolidating%20Shuffle%20Files%20in%20Spark.pdf\" rel=\"nofollow\">https://spark-project.atlassian.net/secure/attachment/10600/Consolidating%20Shuffle%20Files%20in%20Spark.pdf.

有时你可能只是发现你需要你的DevOps增加FS支持inode数。

Decrease partitions (see coalesce with shuffle = false).
One can drop the number to O(R) by "consolidating files". As different file-systems behave differently it’s recommended that you read up on spark.shuffle.consolidateFiles and see https://spark-project.atlassian.net/secure/attachment/10600/Consolidating%20Shuffle%20Files%20in%20Spark.pdf.
Sometimes you may simply find that you need your DevOps to increase the number of inodes the FS supports.

修改

合并文件已经从火花，因为1.6版本中删除。
https://issues.apache.org/jira/browse/SPARK-9808

Consolidating files has been removed from spark since version 1.6. https://issues.apache.org/jira/browse/SPARK-9808

这篇关于为什么一个作业失败，[否留在设备和QUOT空间;但是DF说，否则？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

为什么一个作业失败，[否留在设备和QUOT空间;但是DF说，否则？ [英] Why does a job fail with "No space left on device", but df says otherwise?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

为什么一个作业失败，[否留在设备和QUOT空间;但是DF说，否则？ [英] Why does a job fail with &quot;No space left on device&quot;, but df says otherwise?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

为什么一个作业失败，[否留在设备和QUOT空间;但是DF说，否则？ [英] Why does a job fail with "No space left on device", but df says otherwise?

登录关闭