在apache hadoop中从hdfs读取或复制到hdfs时出现校验和异常 [英] Checksum Exception when reading from or copying to hdfs in apache hadoop

查看:31
本文介绍了在apache hadoop中从hdfs读取或复制到hdfs时出现校验和异常的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 Apache hadoop 实现并行化算法,但是在尝试将文件从本地文件系统传输到 hdfs 时遇到了一些问题.尝试读取或传输文件时抛出校验和异常.

I am trying to implement a parallelized algorithm using Apache hadoop, however I am facing some issues when trying to transfer a file from the local file system to hdfs. A checksum exception is being thrown when trying to read from or transfer a file.

奇怪的是,有些文件被成功复制,而另一些则没有成功(我尝试了 2 个文件,一个比另一个稍大,但两个文件都很小).我所做的另一个观察是 Java FileSystem.getFileChecksum 方法在所有情况下都返回 null.

The strange thing is that some files are being successfully copied while others are not (I tried with 2 files, one is slightly bigger than the other, both are small in size though). Another observation that I have made is that the Java FileSystem.getFileChecksum method, is returning a null in all cases.

关于我想要实现的目标的一些背景知识:我正在尝试将文件写入 hdfs,以便能够将其用作我编写的 mapreduce 作业的分布式缓存.

A slight background on what I am trying to achieve: I am trying to write a file to hdfs, to be able to use it as a distributed cache for the mapreduce job that I have written.

我还尝试了从终端执行 hadoop fs -copyFromLocal 命令,结果与通过 java 代码完成时的行为完全相同.

I have also tried the hadoop fs -copyFromLocal command from the terminal, and the result is the exact same behaviour as when it is done through the java code.

我已经浏览了整个网络,包括关于 stackoverflow 的其他问题,但是我还没有设法解决这个问题.请注意,我对 hadoop 还很陌生,因此非常感谢任何帮助.

I have looked all over the web, including other questions here on stackoverflow however I haven't managed to solve the issue. Please be aware that I am still quite new to hadoop so any help is greatly appreciated.

我在下面附上堆栈跟踪,其中显示了抛出的异常.(在这种情况下,我已经发布了由终端的 hadoop fs -copyFromLocal 命令产生的堆栈跟踪)

I am attaching the stack trace below which shows the exceptions being thrown. (In this case I have posted the stack trace resulting from the hadoop fs -copyFromLocal command from terminal)

name@ubuntu:~/Desktop/hadoop2$ bin/hadoop fs -copyFromLocal ~/Desktop/dtlScaleData/attr.txt /tmp/hadoop-name/dfs/data/attr2.txt

13/03/15 15:02:51 INFO util.NativeCodeLoader: Loaded the native-hadoop library
    13/03/15 15:02:51 INFO fs.FSInputChecker: Found checksum error: b[0, 0]=
    org.apache.hadoop.fs.ChecksumException: Checksum error: /home/name/Desktop/dtlScaleData/attr.txt at 0
        at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.readChunk(ChecksumFileSystem.java:219)
        at org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:237)
        at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:189)
        at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:158)
        at java.io.DataInputStream.read(DataInputStream.java:100)
        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:68)
        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:47)
        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:100)
        at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:230)
        at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:176)
        at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1183)
        at org.apache.hadoop.fs.FsShell.copyFromLocal(FsShell.java:130)
        at org.apache.hadoop.fs.FsShell.run(FsShell.java:1762)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at org.apache.hadoop.fs.FsShell.main(FsShell.java:1895)
    copyFromLocal: Checksum error: /home/name/Desktop/dtlScaleData/attr.txt at 0

推荐答案

您可能遇到了 HADOOP-7199.发生的情况是,当你用copyToLocal下载一个文件时,它也会在同一目录下复制一个crc文件,所以如果你修改你的文件然后尝试做copyFromLocal,它将对您的新文件进行校验和并与您的本地 crc 文件进行比较,并以非描述性错误消息失败.

You are probably hitting the bug described in HADOOP-7199. What happens is that when you download a file with copyToLocal, it also copies a crc file in the same directory, so if you modify your file and then try to do copyFromLocal, it will do a checksum of your new file and compare to your local crc file and fail with a non descriptive error message.

要修复它,请检查您是否有这个 crc 文件,如果您删除它并重试.

To fix it, please check if you have this crc file, if you do just remove it and try again.

这篇关于在apache hadoop中从hdfs读取或复制到hdfs时出现校验和异常的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆