如何修复损坏的HDFS文件 [英] How to fix corrupt HDFS FIles

查看:973
本文介绍了如何修复损坏的HDFS文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

某人如何修复损坏的HDFS?我查看了Apache / Hadoop网站,并说它的 fsck 命令,它不能解决它。希望之前遇到这个问题的人可以告诉我如何解决这个问题。


$ b


与传统的fsck工具不同,命令不会纠正它检测到的错误。通常情况下,NameNode会自动纠正大部分可恢复失败。


当我运行 bin / hadoop fsck / -delete ,它列出了损坏或丢失块的文件。我如何让它不会腐败?这是在一台练习机上,所以我可以把所有东西都吹走,但是当我们上线时,我不能通过吹走所有东西来修复它,所以我现在想弄明白。

解决方案

您可以使用

  hdfs fsck / 

来确定哪些文件有问题。查看丢失或损坏块的输出(现在忽略未复制块)。这个命令实际上是
详细信息,特别是在一个大的HDFS文件系统中,所以我通常会得到
的有意义的输出,其中

  hdfs fsck / | egrep -v'^ \。+ $'| grep -v eplica 

它忽略了只有点和线在谈论复制的行。

一旦找到损坏的文件

  hdfs fsck / path / to / corrupt / file -locations -blocks -files 

使用该输出来确定块可能存在的位置。如果文件比块大小大
,则可能有多个块。



您可以使用报告的块编号转到
datanodes和namenode日志搜索块所在的机器或
。尝试在这些机器上查找文件系统错误
。缺少挂载点,datanode未运行,
文件系统重新格式化/重新配置。如果您能以这种方式找到问题
,并将该块返回在线状态,则该文件将恢复为健康



泡沫冲洗并重复,直到所有的文件都是健康的,或者用尽
寻找块的所有替代品。



一旦您确定发生了什么,并且无法恢复更多块,
只需使用

  hdfs fs -rm / path / to / file / with / permanent / missing / blocks 

命令让您的HDFS文件系统恢复健康,因此您可以开始
跟踪发生的新错误。 p>

How does someone fix a HDFS that's corrupt? I looked on the Apache/Hadoop website and it said its fsck command, which doesn't fix it. Hopefully someone who has run into this problem before can tell me how to fix this.

Unlike a traditional fsck utility for native file systems, this command does not correct the errors it detects. Normally NameNode automatically corrects most of the recoverable failures.

When I ran bin/hadoop fsck / -delete, it listed the files that were corrupt or missing blocks. How do I make it not corrupt? This is on a practice machine so I COULD blow everything away but when we go live, I won't be able to "fix" it by blowing everything away so I'm trying to figure it out now.

解决方案

You can use

  hdfs fsck /

to determine which files are having problems. Look through the output for missing or corrupt blocks (ignore under-replicated blocks for now). This command is really verbose especially on a large HDFS filesystem so I normally get down to the meaningful output with

  hdfs fsck / | egrep -v '^\.+$' | grep -v eplica

which ignores lines with nothing but dots and lines talking about replication.

Once you find a file that is corrupt

  hdfs fsck /path/to/corrupt/file -locations -blocks -files

Use that output to determine where blocks might live. If the file is larger than your block size it might have multiple blocks.

You can use the reported block numbers to go around to the datanodes and the namenode logs searching for the machine or machines on which the blocks lived. Try looking for filesystem errors on those machines. Missing mount points, datanode not running, file system reformatted/reprovisioned. If you can find a problem in that way and bring the block back online that file will be healthy again.

Lather rinse and repeat until all files are healthy or you exhaust all alternatives looking for the blocks.

Once you determine what happened and you cannot recover any more blocks, just use the

  hdfs fs -rm /path/to/file/with/permanently/missing/blocks

command to get your HDFS filesystem back to healthy so you can start tracking new errors as they occur.

这篇关于如何修复损坏的HDFS文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆