HDFS文件校验和 [英] HDFS File Checksum

查看:102
本文介绍了HDFS文件校验和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用Hadoop API-DFSCleint.getFileChecksum()复制到HDFS后,我正在尝试检查文件的一致性.

I am trying to check the consistency of a file after copying to HDFS using Hadoop API - DFSCleint.getFileChecksum().

对于上面的代码,我得到以下输出:

I am getting the following output for the above code:

Null
HDFS : null
Local : null

任何人都可以指出错误或错误吗?这是代码:

Can anyone point out the error or mistake? Here is the Code :

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileChecksum;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.LocalFileSystem;
import org.apache.hadoop.fs.Path;


public class fileCheckSum {

    /**
     * @param args
     * @throws IOException 
     */
    public static void main(String[] args) throws IOException {
        // TODO Auto-generated method stub

        Configuration conf = new Configuration();

         FileSystem hadoopFS = FileSystem.get(conf);
    //  Path hdfsPath = new Path("/derby.log");

        LocalFileSystem localFS = LocalFileSystem.getLocal(conf);
    //      Path localPath = new Path("file:///home/ubuntu/derby.log");


    //  System.out.println("HDFS PATH : "+hdfsPath.getName());
    //      System.out.println("Local PATH : "+localPath.getName());

        FileChecksum hdfsChecksum = hadoopFS.getFileChecksum(new Path("/derby.log"));
        FileChecksum localChecksum = localFS.getFileChecksum(new Path("file:///home/ubuntu/derby.log"));


        if(null!=hdfsChecksum || null!=localChecksum){
            System.out.println("HDFS Checksum : "+hdfsChecksum.toString()+"\t"+hdfsChecksum.getLength());
            System.out.println("Local Checksum : "+localChecksum.toString()+"\t"+localChecksum.getLength());

            if(hdfsChecksum.toString().equals(localChecksum.toString())){
                System.out.println("Equal");
            }else{
                System.out.println("UnEqual");

            }
        }else{
            System.out.println("Null");
            System.out.println("HDFS : "+hdfsChecksum);
            System.out.println("Local : "+localChecksum);

        }

    }

}

推荐答案

由于您没有在 conf 上设置远程地址,并且本质上使用相同的配置,因此两个 hadoopFS localFS 指向 LocalFileSystem 的实例.

Since you aren't setting a remote address on the conf and essentially using the same configuration, both hadoopFS and localFS are pointing to an instance of LocalFileSystem.

getFileChecksum 未针对 LocalFileSystem 实现,并返回null.不过,它应该适用于 DistributedFileSystem ,如果您的 conf 指向分布式集群,则 FileSystem.get(conf)应该返回一个实例.的 DistributedFileSystem 返回一个 CRC32校验和的MD5的MD5 大小为 bytes.per.checksum 的块.此值取决于块大小和群集范围的配置 bytes.per.checksum .这就是为什么这两个参数也被编码在分布式校验和的返回值中作为算法名称的原因:MD5-of-xxxMD5-of-yyyCRC32其中xxx是每个块的CRC校验和的数量,yyy是字节.per.checksum 参数.

getFileChecksum isn't implemented for LocalFileSystem and returns null. It should be working for DistributedFileSystem though, which if your conf is pointing to a distributed cluster, FileSystem.get(conf) should return an instance of DistributedFileSystem that returns an MD5 of MD5 of CRC32 checksums of chunks of size bytes.per.checksum. This value depends on the block size and the cluster-wide config, bytes.per.checksum. That's why these two params are also encoded in the return value of the distributed checksum as the name of the algorithm: MD5-of-xxxMD5-of-yyyCRC32 where xxx is number of CRC checksums per block and yyy is the bytes.per.checksum parameter.

getFileChecksum 的设计目的不是跨文件系统可比.尽管可以在本地模拟分布式校验和,或手工制作map-reduce作业来计算等效的本地哈希值,但我还是建议依靠Hadoop自身的完整性检查,该检查在文件写入Hadoop或从Hadoop读取文件时进行.

The getFileChecksum isn't designed to be comparable across filesystems. Although it's possible to simulate the distributed checksum locally, or hand-craft map-reduce jobs to calculate equivalents of local hashes, I suggest relying Hadoop's own integrity checks that happens when a files gets written to or read from Hadoop

这篇关于HDFS文件校验和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆