为什么一个作业失败,[否留在设备和QUOT空间;但是DF说,否则? [英] Why does a job fail with "No space left on device", but df says otherwise?

查看:159
本文介绍了为什么一个作业失败,[否留在设备和QUOT空间;但是DF说,否则?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

执行当我洗牌星火作业失败,并说:留在设备上没有空间,但是当我运行 DF -h 它说我有剩余空间!为什么会这样,我怎么能解决这个问题?

When performing a shuffle my Spark job fails and says "no space left on device", but when I run df -h it says I have free space left! Why does this happen, and how can I fix it?

推荐答案

您还需要监控 DF -i 这显示多少的inode都在使用。

You need to also monitor df -i which shows how many inodes are in use.

在每台机器上,我们创建M * R代表洗牌,临时文件,其中M = map任务数,R = reduce任务的数量。

on each machine, we create M * R temporary files for shuffle, where M = number of map tasks, R = number of reduce tasks.

https://spark-project.atlassian.net/browse/SPARK-751

如果你确实看到磁盘被耗尽索引节点来解决这个问题,您可以:

If you do indeed see that disks are running out of inodes to fix the problem you can:


  • 减小分区(见合并 =洗牌虚假)。

  • 人们可以通过合并文件拖放到O(R)的数量。由于不同的文件系统不同的表现我们建议您在 spark.shuffle.consolidateFiles 阅读,看到<一个href=\"https://spark-project.atlassian.net/secure/attachment/10600/Consolidating%20Shuffle%20Files%20in%20Spark.pdf\" rel=\"nofollow\">https://spark-project.atlassian.net/secure/attachment/10600/Consolidating%20Shuffle%20Files%20in%20Spark.pdf.

  • 有时你可能只是发现你需要你的DevOps增加FS支持inode数。

  • Decrease partitions (see coalesce with shuffle = false).
  • One can drop the number to O(R) by "consolidating files". As different file-systems behave differently it’s recommended that you read up on spark.shuffle.consolidateFiles and see https://spark-project.atlassian.net/secure/attachment/10600/Consolidating%20Shuffle%20Files%20in%20Spark.pdf.
  • Sometimes you may simply find that you need your DevOps to increase the number of inodes the FS supports.

修改

合并文件已经从火花,因为1.6版本中删除。
https://issues.apache.org/jira/browse/SPARK-9808

Consolidating files has been removed from spark since version 1.6. https://issues.apache.org/jira/browse/SPARK-9808

这篇关于为什么一个作业失败,[否留在设备和QUOT空间;但是DF说,否则?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆