检索使用JavaScript二进制文件的内容,连接的base64 code将其和反德code将其使用Python [英] Retrieving binary file content using Javascript, base64 encode it and reverse-decode it using Python

查看:287
本文介绍了检索使用JavaScript二进制文件的内容,连接的base64 code将其和反德code将其使用Python的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想c。使用这个简单的内容使用 XMLHtt prequest (使用的是最新的Webkit)和基于64位带$ C $下载的二进制文件功能:

I'm trying to download a binary file using XMLHttpRequest (using a recent Webkit) and base64-encode its contents using this simple function:

function getBinary(file){
    var xhr = new XMLHttpRequest();  
    xhr.open("GET", file, false);  
    xhr.overrideMimeType("text/plain; charset=x-user-defined");  
    xhr.send(null);
    return xhr.responseText;
}

function base64encode(binary) {
    return btoa(unescape(encodeURIComponent(binary)));
}

var binary = getBinary('http://some.tld/sample.pdf');
var base64encoded = base64encode(binary);

作为一个方面说明,一切上面是标准的JavaScript的东西,包括 BTOA()连接codeURIComponent():<一href=\"https://developer.mozilla.org/en/DOM/window.btoa\">https://developer.mozilla.org/en/DOM/window.btoa

这工作pretty顺利,我甚至可以去code使用Javascript中的Base64内容:

This works pretty smoothly, and I can even decode the base64 contents using Javascript:

function base64decode(base64) {
    return decodeURIComponent(escape(atob(base64)));
}

var decodedBinary = base64decode(base64encoded);
decodedBinary === binary // true

现在,我要脱code为Base64恩codeD使用Python从而消耗一些JSON字符串来获得内容的 base64en codeD 字符串值。这天真是我做什么:

Now, I want to decode the base64-encoded contents using Python which consume some JSON string to get the base64encoded string value. Naively this is what I do:

import urllib
import base64
# ... retrieving of base64 encoded string through JSON
base64 = "77+9UE5HDQ……………oaCgA="
source_contents = urllib.unquote(base64.b64decode(base64))
destination_file = open(destination, 'wb')
destination_file.write(source_contents)
destination_file.close()

但得到的文件是无效的,看起来像操作的messaed了UTF-8编码或东西还是我不清楚。

But the resulting file is invalid, looks like the operation's messaed up with UTF-8, encoding or something which is still unclear to me.

如果我试图把它们在目标文件之前取消code UTF-8的内容,将引发一个错误:

If I try to decode UTF-8 contents before putting them in the destination file, an error is raised:

import urllib
import base64
# ... retrieving of base64 encoded string through JSON
base64 = "77+9UE5HDQ……………oaCgA="
source_contents = urllib.unquote(base64.b64decode(base64)).decode('utf-8')
destination_file = open(destination, 'wb')
destination_file.write(source_contents)
destination_file.close()

$ python test.py
// ...
UnicodeEncodeError: 'ascii' codec can't encode character u'\ufffd' in position 0: ordinal not in range(128)

作为一个侧面说明,这里的同一文件的两个文本重新presentations截图;左:原;右:从基于64位德codeD字符串创建的: http://cl.ly/0U3G34110z3c132O2e2x

是否存在已知的伎俩试图重新创建文件时来规避这些问题编码?你将如何实现这一目标自己呢?

Is there a known trick to circumvent these problems with encoding when attempting to recreating the file? How would you achieve this yourself?

任何帮助或暗示多少AP preciated:)

Any help or hint much appreciated :)

推荐答案

所以我回答我自己 - 和抱歉 - 但我认为这可能是有用的人已经灭失,因为我是;)

So I'm answering to myself — and sorry for that — but I think it might be useful for someone as lost as I was ;)

所以,你必须使用 ArrayBuffer 并设置的responseType 你的 XMLHtt prequest 对象实例的属性 arraybuffer 检索字节土生土长的阵列,它可以使用下列方便的功能转换为base64(发现的,笔者可以在这里祝福):

So you have to use ArrayBuffer and set the responseType property of your XMLHttpRequest object instance to arraybuffer for retrieving a native array of Bytes, which can be converted to base64 using the following convenient function (found there, author may be blessed here):

function base64ArrayBuffer(arrayBuffer) {
  var base64    = ''
  var encodings = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'

  var bytes         = new Uint8Array(arrayBuffer)
  var byteLength    = bytes.byteLength
  var byteRemainder = byteLength % 3
  var mainLength    = byteLength - byteRemainder

  var a, b, c, d
  var chunk

  // Main loop deals with bytes in chunks of 3
  for (var i = 0; i < mainLength; i = i + 3) {
    // Combine the three bytes into a single integer
    chunk = (bytes[i] << 16) | (bytes[i + 1] << 8) | bytes[i + 2]

    // Use bitmasks to extract 6-bit segments from the triplet
    a = (chunk & 16515072) >> 18 // 16515072 = (2^6 - 1) << 18
    b = (chunk & 258048)   >> 12 // 258048   = (2^6 - 1) << 12
    c = (chunk & 4032)     >>  6 // 4032     = (2^6 - 1) << 6
    d = chunk & 63               // 63       = 2^6 - 1

    // Convert the raw binary segments to the appropriate ASCII encoding
    base64 += encodings[a] + encodings[b] + encodings[c] + encodings[d]
  }

  // Deal with the remaining bytes and padding
  if (byteRemainder == 1) {
    chunk = bytes[mainLength]

    a = (chunk & 252) >> 2 // 252 = (2^6 - 1) << 2

    // Set the 4 least significant bits to zero
    b = (chunk & 3)   << 4 // 3   = 2^2 - 1

    base64 += encodings[a] + encodings[b] + '=='
  } else if (byteRemainder == 2) {
    chunk = (bytes[mainLength] << 8) | bytes[mainLength + 1]

    a = (chunk & 64512) >> 10 // 64512 = (2^6 - 1) << 10
    b = (chunk & 1008)  >>  4 // 1008  = (2^6 - 1) << 4

    // Set the 2 least significant bits to zero
    c = (chunk & 15)    <<  2 // 15    = 2^4 - 1

    base64 += encodings[a] + encodings[b] + encodings[c] + '='
  }

  return base64
}

所以这里有一个工作code:

So here's a working code:

var xhr = new XMLHttpRequest();
xhr.open('GET', 'http://some.tld/favicon.png', false);
xhr.responseType = 'arraybuffer';
xhr.onload = function(e) {
    console.log(base64ArrayBuffer(e.currentTarget.response));
};
xhr.send();

这将记录的有效的连接的base64 codeD字符串重新presenting二进制文件的内容。

This will log a valid base64 encoded string representing the binary file contents.

编辑:的对于没有旧的浏览器访问 ArrayBuffer 并具有 BTOA()未能在编码的字符,这里是另一种方式来获得一个base64连接任何二进制的codeD版:

For older browsers not having access to ArrayBuffer and having btoa() failing on encoding characters, here's another way to get a base64 encoded version of any binary:

function getBinary(file){
    var xhr = new XMLHttpRequest();
    xhr.open("GET", file, false);
    xhr.overrideMimeType("text/plain; charset=x-user-defined");
    xhr.send(null);
    return xhr.responseText;
}

function base64Encode(str) {
    var CHARS = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
    var out = "", i = 0, len = str.length, c1, c2, c3;
    while (i < len) {
        c1 = str.charCodeAt(i++) & 0xff;
        if (i == len) {
            out += CHARS.charAt(c1 >> 2);
            out += CHARS.charAt((c1 & 0x3) << 4);
            out += "==";
            break;
        }
        c2 = str.charCodeAt(i++);
        if (i == len) {
            out += CHARS.charAt(c1 >> 2);
            out += CHARS.charAt(((c1 & 0x3)<< 4) | ((c2 & 0xF0) >> 4));
            out += CHARS.charAt((c2 & 0xF) << 2);
            out += "=";
            break;
        }
        c3 = str.charCodeAt(i++);
        out += CHARS.charAt(c1 >> 2);
        out += CHARS.charAt(((c1 & 0x3) << 4) | ((c2 & 0xF0) >> 4));
        out += CHARS.charAt(((c2 & 0xF) << 2) | ((c3 & 0xC0) >> 6));
        out += CHARS.charAt(c3 & 0x3F);
    }
    return out;
}

console.log(base64Encode(getBinary('http://www.google.fr/images/srpr/logo3w.png')));

希望这会帮助别人,因为它为我做的。

Hope this helps others as it did for me.

这篇关于检索使用JavaScript二进制文件的内容,连接的base64 code将其和反德code将其使用Python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆