在apache hadoop中读取或复制hdfs时出现校验和异常 [英] Checksum Exception when reading from or copying to hdfs in apache hadoop

查看：135 发布时间：2018/5/31 19:18:18 apache hadoop mapreduce

本文介绍了在apache hadoop中读取或复制hdfs时出现校验和异常的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图使用Apache hadoop实现并行化算法，但是在尝试从本地文件系统向hdfs传输文件时遇到了一些问题。尝试读取或传输文件时会引发校验和异常。

奇怪的是，一些文件正在被成功复制，而另一些文件却没有被复制（我用2个文件尝试过，其中一个比另一个稍大，两者尺寸都很小虽然）。我所做的另一个观察是，Java FileSystem.getFileChecksum 方法在所有情况下都返回 null 。

<我试图实现的一个小背景：我正在尝试写一个文件给hdfs，以便能够将它用作我写过的mapreduce作业的分布式缓存。

我也从终端尝试了 hadoop fs -copyFromLocal 命令，结果与通过java代码完成的行为完全相同。

我已经浏览了所有的网页，其中包括其他问题，但是我还没有设法解决这个问题。请注意，对于 hadoop ，我仍然很陌生，因此非常感谢任何帮助。

我附上下面的堆栈跟踪，其中显示抛出异常。（在这种情况下，我从终端发布了来自 hadoop fs -copyFromLocal 命令的堆栈跟踪）

  name @ ubuntu：〜/ Desktop / hadoop2 $ bin / hadoop fs -copyFromLocal〜/ Desktop / dtlScaleData / attr.txt /tmp/hadoop-name/dfs/data/attr2.txt 
 
 13 / 03/15 15:02:51信息util.NativeCodeLoader：加载native-hadoop库
 13/03/15 15:02:51信息fs.FSInputChecker：找到校验和错误：b [0，0] = 
 org.apache.hadoop.fs.ChecksumFileSystem $ ChecksumFSInputChecker.readChunk（
 org.apache.hadoop.fs.ChecksumException：Checksum error：/home/name/Desktop/dtlScaleData/attr.txt at 0 
） ChecksumFileSystem.java:219）
 at org.apache.hadoop.fs.FSInputChecker.readChecksumChunk（FSInputChecker.java:237）
 at org.apache.hadoop.fs.FSInputChecker.read1（FSInputChecker.java： 189）
在org.apache.hadoop.fs.FSInputChecker.read（FSInputChecker.java:158）
在java.io.DataInputStream.read（DataInputStream.jav a：100）
 at org.apache.hadoop.io.IOUtils.copyBytes（IOUtils.java:68）
 at org.apache.hadoop.io.IOUtils.copyBytes（IOUtils.java:47） 
 at org.apache.hadoop.io.IOUtils.copyBytes（IOUtils.java:100）
 at org.apache.hadoop.fs.FileUtil.copy（FileUtil.java:230）
在org.apache.hadoop.fs.FileUtil.copy（FileUtil.java:176）
在org.apache.hadoop.fs.FileSystem.copyFromLocalFile（FileSystem.java:1183）
在org.apache .hadoop.fs.FsShell.copyFromLocal（FsShell.java:130）
在org.apache.hadoop.fs.FsShell.run（FsShell.java:1762）
在org.apache.hadoop.util .ToolRunner.run（ToolRunner.java:65）
 at org.apache.hadoop.util.ToolRunner.run（ToolRunner.java:79）
 at org.apache.hadoop.fs.FsShell.main （FsShell.java:1895）
 copyFromLocal：Checksum error：/home/name/Desktop/dtlScaleData/attr.txt at 0

解决方案

您可能碰到错误如 HADOOP-7199 中所述。会发生什么情况是，当您使用 copyToLocal 下载文件时，它还会在同一目录中复制一个 crc 文件，因此如果您修改了文件并然后尝试执行 copyFromLocal ，它会对您的新文件执行校验和并与本地 crc 文件进行比较，并失败并显示非描述性错误消息。

要解决这个问题，请检查您是否有 crc 文件，如果您只是删除它并重试。

I am trying to implement a parallelized algorithm using Apache hadoop, however I am facing some issues when trying to transfer a file from the local file system to hdfs. A checksum exception is being thrown when trying to read from or transfer a file.

The strange thing is that some files are being successfully copied while others are not (I tried with 2 files, one is slightly bigger than the other, both are small in size though). Another observation that I have made is that the Java FileSystem.getFileChecksum method, is returning a null in all cases.

A slight background on what I am trying to achieve: I am trying to write a file to hdfs, to be able to use it as a distributed cache for the mapreduce job that I have written.

I have also tried the hadoop fs -copyFromLocal command from the terminal, and the result is the exact same behaviour as when it is done through the java code.

I have looked all over the web, including other questions here on stackoverflow however I haven't managed to solve the issue. Please be aware that I am still quite new to hadoop so any help is greatly appreciated.

I am attaching the stack trace below which shows the exceptions being thrown. (In this case I have posted the stack trace resulting from the hadoop fs -copyFromLocal command from terminal)

name@ubuntu:~/Desktop/hadoop2$ bin/hadoop fs -copyFromLocal ~/Desktop/dtlScaleData/attr.txt /tmp/hadoop-name/dfs/data/attr2.txt

13/03/15 15:02:51 INFO util.NativeCodeLoader: Loaded the native-hadoop library
    13/03/15 15:02:51 INFO fs.FSInputChecker: Found checksum error: b[0, 0]=
    org.apache.hadoop.fs.ChecksumException: Checksum error: /home/name/Desktop/dtlScaleData/attr.txt at 0
        at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.readChunk(ChecksumFileSystem.java:219)
        at org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:237)
        at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:189)
        at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:158)
        at java.io.DataInputStream.read(DataInputStream.java:100)
        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:68)
        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:47)
        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:100)
        at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:230)
        at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:176)
        at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1183)
        at org.apache.hadoop.fs.FsShell.copyFromLocal(FsShell.java:130)
        at org.apache.hadoop.fs.FsShell.run(FsShell.java:1762)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at org.apache.hadoop.fs.FsShell.main(FsShell.java:1895)
    copyFromLocal: Checksum error: /home/name/Desktop/dtlScaleData/attr.txt at 0

解决方案

You are probably hitting the bug described in HADOOP-7199. What happens is that when you download a file with copyToLocal, it also copies a crc file in the same directory, so if you modify your file and then try to do copyFromLocal, it will do a checksum of your new file and compare to your local crc file and fail with a non descriptive error message.

To fix it, please check if you have this crc file, if you do just remove it and try again.

这篇关于在apache hadoop中读取或复制hdfs时出现校验和异常的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在apache hadoop中读取或复制hdfs时出现校验和异常 [英] Checksum Exception when reading from or copying to hdfs in apache hadoop

问题描述

相关文章

服务器开发最新文章

热门教程

热门工具

登录关闭

在apache hadoop中读取或复制hdfs时出现校验和异常 [英] Checksum Exception when reading from or copying to hdfs in apache hadoop

问题描述

相关文章

服务器开发最新文章

热门教程

热门工具

登录 关闭

登录关闭