Hadoop 中的校验和验证 [英] Checksum verification in Hadoop

查看：124 发布时间：2021/12/15 19:10:42 hadoop hdfs checksum

本文介绍了Hadoop 中的校验和验证的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在我们通过 Webhdfs 将文件从 Linux 服务器移动到 Hadoop (HDFS) 后，我们是否需要验证校验和?

Do we need to verify checksum after we move files to Hadoop (HDFS) from a Linux server through a Webhdfs ?

我想确保 HDFS 上的文件在复制后没有损坏.但是有必要检查校验和吗?

I would like to make sure the files on the HDFS have no corruption after they are copied. But is checking checksum necessary?

在数据写入 HDFS 之前，我读取客户端进行校验和

I read client does checksum before data is written to HDFS

有人可以帮助我了解如何确保 Linux 系统上的源文件与使用 webhdfs 的 Hdfs 上的摄取文件相同.

Can somebody help me to understand how can I make sure that source file on Linux system is same as ingested file on Hdfs using webhdfs.

推荐答案

如果您的目标是比较驻留在 HDFS 上的两个文件，我不会使用hdfs dfs -checksum URI"，因为在我的情况下它会为文件生成不同的校验和内容相同.

If your goal is to compare two files residing on HDFS, I would not use "hdfs dfs -checksum URI" as in my case it generates different checksums for files with identical content.

在下面的例子中，我比较了两个在不同位置具有相同内容的文件:

In the below example I am comparing two files with the same content in different locations:

老式的 md5sum 方法返回相同的校验和:

Old-school md5sum method returns the same checksum:

$ hdfs dfs -cat /project1/file.txt | md5sum
b9fdea463b1ce46fabc2958fc5f7644a  -

$ hdfs dfs -cat /project2/file.txt | md5sum
b9fdea463b1ce46fabc2958fc5f7644a  -

但是，对于相同内容的文件，HDFS 上生成的校验和是不同的:

However, checksum generated on the HDFS is different for files with the same content:

$ hdfs dfs -checksum /project1/file.txt
0000020000000000000000003e50be59553b2ddaf401c575f8df6914

$ hdfs dfs -checksum /project2/file.txt
0000020000000000000000001952d653ccba138f0c4cd4209fbf8e2e

有点令人费解，因为我希望针对相同的内容生成相同的校验和.

A bit puzzling as I would expect identical checksum to be generated against the identical content.

这篇关于Hadoop 中的校验和验证的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Hadoop 中的校验和验证 [英] Checksum verification in Hadoop

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Hadoop 中的校验和验证 [英] Checksum verification in Hadoop

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭