为什么作业会因“设备上没有剩余空间"而失败,而 df 却另有说法? [英] Why does a job fail with "No space left on device", but df says otherwise?

查看:26
本文介绍了为什么作业会因“设备上没有剩余空间"而失败,而 df 却另有说法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

执行随机播放时,我的 Spark 作业失败并显示设备上没有剩余空间",但是当我运行 df -h 时,它说我还有可用空间!为什么会发生这种情况,我该如何解决?

When performing a shuffle my Spark job fails and says "no space left on device", but when I run df -h it says I have free space left! Why does this happen, and how can I fix it?

推荐答案

您还需要监视 df -i 以显示正在使用的 inode 数量.

You need to also monitor df -i which shows how many inodes are in use.

在每台机器上,我们创建 M * R 个临时文件用于 shuffle,其中 M = map 任务数,R = reduce 任务数.

on each machine, we create M * R temporary files for shuffle, where M = number of map tasks, R = number of reduce tasks.

https://spark-project.atlassian.net/browse/SPARK-751

如果您确实发现磁盘中的 inode 不足以解决问题,您可以:

If you do indeed see that disks are running out of inodes to fix the problem you can:

  • Decrease partitions (see coalesce with shuffle = false).
  • One can drop the number to O(R) by "consolidating files". As different file-systems behave differently it’s recommended that you read up on spark.shuffle.consolidateFiles and see https://spark-project.atlassian.net/secure/attachment/10600/Consolidating%20Shuffle%20Files%20in%20Spark.pdf.
  • Sometimes you may simply find that you need your DevOps to increase the number of inodes the FS supports.

编辑

从 1.6 版起,合并文件已从 Spark 中删除.https://issues.apache.org/jira/browse/SPARK-9808

Consolidating files has been removed from spark since version 1.6. https://issues.apache.org/jira/browse/SPARK-9808

这篇关于为什么作业会因“设备上没有剩余空间"而失败,而 df 却另有说法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆