JS - 如何计算MD5二进制数据 [英] JS - How to compute MD5 on binary data

查看:610
本文介绍了JS - 如何计算MD5二进制数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

编辑:从更改标题为JS文件API - 读写UTF-8的数据是不一致的,以反映实际问题。

changed title from "JS File API - write and read UTF-8 data is inconsistent" to reflect the actual question.

我有一些二进制内容我需要计算的MD5。内容是WARC文件,这意味着它拥有的文本以及连接$ C $光盘镜像。避免在文件保存的错误,我转换和存储在arrayBuffers所有数据。所有数据都放在 UInt8Array s到它转换为UTF-8。

I have some binary content i need to calculate the MD5 of. The content is a WARC file, that means that it holds text as well as encoded images. To avoid errors in the file saving, I convert and store all the data in arrayBuffers. All the data is put in UInt8Arrays to convert it to UTF-8.

我第一次尝试,测试,是使用的saveAs 库从Chrome扩展保存文件。这意味着我是用一个blob对象被传递给方法,并建立档案。

My first attempt, for testing, is to use the saveAs library to save files from Chrome extensions. This means I was using a blob object to be passed on to the method and create the file.

var b = new Blob(arrayBuffers, {type: "text/plain;charset=utf-8"});
saveAs(b,'name.warc');

我还没有找到从的Blob 对象,以便我在用做一个的FileReader 来读取BLOB文件作为二进制数据,然后使用MD5工具(我用cryptoJS以及来自faultylabs工具)来计算的结果。

I haven't found a tool to compute the MD5 from a Blob object so what I was doing was using a FileReader to read the blob file as binary data and then use an MD5 tool (I used cryptoJS as well as a tool from faultylabs) to compute the result.

f = new FileReader();
f.readAsBinaryString(b);
f.onloadend = function(a){
    console.log( 'Original file checksum: ', faultylabs.MD5(this.result) );
}

的资源(图片)在 arraybuffer 直接下载格式,所以我有没有必要将它们转换。

The resources (images) are downloaded directly in arraybuffer format so I have no need to convert them.

结果是错误的,这意味着从code检查MD5和我救了我的本地计算机上的文件,检查它给了2个不同的结果。阅读文本,显然拍摄了一个错误。

The result was wrong, meaning that checking the MD5 from the code and checking it from the file I saved on my local machine gave 2 different results. Reading as text, obviously shoots out an error.

我找到了解决方法,包括使用文件系统的API,然后读回作为二进制数据,计算MD5,然后保存检索文件WARC文件(不是直接的blob对象,但在磁盘上写入blob对象本该文件的刷新版本)。
在这种情况下计算的MD5是好的(我计算它的WARC文件的刷新的版本),但是当我用刷新WARC归档启动WARC重播例如,它引发了我的错误 - 在与原文件我没有任何问题(但MD5是不正确的)。

The workaround I found, consists in writing the blob object on the disk using the filesystem API and then read it back as binary data, compute the MD5 and then save that retrieved file as WARC file (not directly the blob object but this "refreshed" version of the file). In this case the computed MD5 is fine ( I calculate it on the "refreshed" version of the warc file) but when I launch the WARC replay instance with the "refreshed" warc archive, it throws me errors - while with the original file I don't have any problem (but the MD5 is not correct).

var fd = new FormData();

// To compute the md5 hash and to have it correct on the server side, we need to write the file to the system, read it back and then calculate the md5 value.
// We need to send this version of the warc file to the server as well.
window.requestFileSystem  = window.requestFileSystem || window.webkitRequestFileSystem;

function computeWARC_MD5(callback,formData) {
    window.requestFileSystem(window.TEMPORARY, b.size, onInitFs);
    function onInitFs(fs) {
        fs.root.getFile('warc.warc', {create: true}, function(fileEntry) {
            fileEntry.createWriter(function(fileWriter) {
                fileWriter.onwriteend = function(e) {
                  readAndMD5();
                };
                fileWriter.onerror = function(e) {
                  console.error('Write failed: ' + e.toString());
                };
                fileWriter.write(b);
            });
        });

        function readAndMD5() {
            fs.root.getFile('warc.warc', {}, function(fileEntry) {
                fileEntry.file( function(file) {
                    var reader = new FileReader();
                    reader.onloadend = function(e) {
                        var warcMD5 = faultylabs.MD5( this.result );
                        console.log(warcMD5);
                        var g = new Blob([this.result],{type: "text/plain;charset=utf-8"});
                        saveAs(g, o_request.file);
                        formData.append('warc_file', g)
                        formData.append('warc_checksum_md5', warcMD5.toLowerCase());
                        callback(formData);
                    };
                    reader.readAsBinaryString(file);
                });
            });
        }
    }
}

function uploadData(formData) {
    // upload
    $.ajax({
        type: 'POST',
        url: server_URL_upload,
        data: fd,
        processData: false,
        contentType: false,
        // [SPECS] fire a progress event named progress at the XMLHttpRequestUpload object about every 50ms or for every byte transmitted, whichever is least frequent
        xhrFields: {
            onprogress: function (e) {
                if (e.lengthComputable) {
                    console.log(e.loaded / e.total * 100 + '%');
                }
            }
        }
    }).done(function(data) {
       console.log('done uploading!');
       //displayMessage(port_to_page, 'Upload finished!', 'normal')
       //port_to_page.postMessage( { method:"doneUpload" } );
    });
}
computeWARC_MD5(uploadData, fd);
saveAs(b, 'warc.warc');

任何人可以解释我为什么存在这种差异?那我在对待所有我处理的二进制数据的对象缺失(存储,读取)?

Could anybody explain me why there is this discrepancy? What am I missing in treating all the objects I am dealing with as binary data (store, read)?

推荐答案

基本上我试过另一条路线和转换的BLOB文件回到arraybuffer和计算的MD5上。在这一点上,该文件的MD5和所述arraybuffer的是相同的。

Basically I tried another route and converted the blob file back to arraybuffer and computed the MD5 on that. At that point, the file's MD5 and the arraybuffer's are the same.

var b = new Blob(arrayBuffers, {type: "text/plain;charset=utf-8"});
            var blobHtml = new Blob( [str2ab(o_request.main_page_html)], {type: "text/plain;charset=utf-8"} );

f = new FileReader();
f.readAsArrayBuffer(b);
f.onloadend = function(a){
  var warcMD5 = faultylabs.MD5(this.result);
  var fd = new FormData();
  fd.append('warc_file', b)
  fd.append('warc_checksum_md5', warcMD5.toLowerCase());

  uploadData(fd);
}

我猜的结果从二进制字符串,并从缓冲区数组是不同的,这就是为什么还MD5是不一致的。

I guess the result from a binary string and from a buffer array is different, that's why also the MD5 is inconsistent.

这篇关于JS - 如何计算MD5二进制数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆