在apache hadoop中读取或复制hdfs时出现校验和异常 [英] Checksum Exception when reading from or copying to hdfs in apache hadoop

查看:135
本文介绍了在apache hadoop中读取或复制hdfs时出现校验和异常的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图使用Apache hadoop实现并行化算法,但是在尝试从本地文件系统向hdfs传输文件时遇到了一些问题。尝试读取或传输文件时会引发校验和异常



奇怪的是,一些文件正在被成功复制,而另一些文件却没有被复制(我用2个文件尝试过,其中一个比另一个稍大,两者尺寸都很小虽然)。我所做的另一个观察是,Java FileSystem.getFileChecksum 方法在所有情况下都返回 null



<我试图实现的一个小背景:我正在尝试写一个文件给hdfs,以便能够将它用作我写过的mapreduce作业的分布式缓存。



我也从终端尝试了 hadoop fs -copyFromLocal 命令,结果与通过java代码完成的行为完全相同。



我已经浏览了所有的网页,其中包括其他问题,但是我还没有设法解决这个问题。请注意,对于 hadoop ,我仍然很陌生,因此非常感谢任何帮助。



我附上下面的堆栈跟踪,其中显示抛出异常。 (在这种情况下,我从终端发布了来自 hadoop fs -copyFromLocal 命令的堆栈跟踪)

  name @ ubuntu:〜/ Desktop / hadoop2 $ bin / hadoop fs -copyFromLocal〜/ Desktop / dtlScaleData / attr.txt /tmp/hadoop-name/dfs/data/attr2.txt 

13 / 03/15 15:02:51信息util.NativeCodeLoader:加载native-hadoop库
13/03/15 15:02:51信息fs.FSInputChecker:找到校验和错误:b [0,0] =
org.apache.hadoop.fs.ChecksumFileSystem $ ChecksumFSInputChecker.readChunk(
org.apache.hadoop.fs.ChecksumException:Checksum error:/home/name/Desktop/dtlScaleData/attr.txt at 0
) ChecksumFileSystem.java:219)
at org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:237)
at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java: 189)
在org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:158)
在java.io.DataInputStream.read(DataInputStream.jav a:100)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:68)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:47)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:100)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:230)
在org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:176)
在org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1183)
在org.apache .hadoop.fs.FsShell.copyFromLocal(FsShell.java:130)
在org.apache.hadoop.fs.FsShell.run(FsShell.java:1762)
在org.apache.hadoop.util .ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.fs.FsShell.main (FsShell.java:1895)
copyFromLocal:Checksum error:/home/name/Desktop/dtlScaleData/attr.txt at 0


解决方案

您可能碰到错误如 HADOOP-7199 中所述。会发生什么情况是,当您使用 copyToLocal 下载文件时,它还会在同一目录中复制一个 crc 文件,因此如果您修改了文件并然后尝试执行 copyFromLocal ,它会对您的新文件执行校验和并与本地 crc 文件进行比较,并失败并显示非描述性错误消息。



要解决这个问题,请检查您是否有 crc 文件,如果您只是删除它并重试。


I am trying to implement a parallelized algorithm using Apache hadoop, however I am facing some issues when trying to transfer a file from the local file system to hdfs. A checksum exception is being thrown when trying to read from or transfer a file.

The strange thing is that some files are being successfully copied while others are not (I tried with 2 files, one is slightly bigger than the other, both are small in size though). Another observation that I have made is that the Java FileSystem.getFileChecksum method, is returning a null in all cases.

A slight background on what I am trying to achieve: I am trying to write a file to hdfs, to be able to use it as a distributed cache for the mapreduce job that I have written.

I have also tried the hadoop fs -copyFromLocal command from the terminal, and the result is the exact same behaviour as when it is done through the java code.

I have looked all over the web, including other questions here on stackoverflow however I haven't managed to solve the issue. Please be aware that I am still quite new to hadoop so any help is greatly appreciated.

I am attaching the stack trace below which shows the exceptions being thrown. (In this case I have posted the stack trace resulting from the hadoop fs -copyFromLocal command from terminal)

name@ubuntu:~/Desktop/hadoop2$ bin/hadoop fs -copyFromLocal ~/Desktop/dtlScaleData/attr.txt /tmp/hadoop-name/dfs/data/attr2.txt

13/03/15 15:02:51 INFO util.NativeCodeLoader: Loaded the native-hadoop library
    13/03/15 15:02:51 INFO fs.FSInputChecker: Found checksum error: b[0, 0]=
    org.apache.hadoop.fs.ChecksumException: Checksum error: /home/name/Desktop/dtlScaleData/attr.txt at 0
        at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.readChunk(ChecksumFileSystem.java:219)
        at org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:237)
        at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:189)
        at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:158)
        at java.io.DataInputStream.read(DataInputStream.java:100)
        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:68)
        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:47)
        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:100)
        at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:230)
        at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:176)
        at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1183)
        at org.apache.hadoop.fs.FsShell.copyFromLocal(FsShell.java:130)
        at org.apache.hadoop.fs.FsShell.run(FsShell.java:1762)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at org.apache.hadoop.fs.FsShell.main(FsShell.java:1895)
    copyFromLocal: Checksum error: /home/name/Desktop/dtlScaleData/attr.txt at 0

解决方案

You are probably hitting the bug described in HADOOP-7199. What happens is that when you download a file with copyToLocal, it also copies a crc file in the same directory, so if you modify your file and then try to do copyFromLocal, it will do a checksum of your new file and compare to your local crc file and fail with a non descriptive error message.

To fix it, please check if you have this crc file, if you do just remove it and try again.

这篇关于在apache hadoop中读取或复制hdfs时出现校验和异常的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆