使用Javascript的atob来解码base64不能正确解码utf-8字符串 [英] Using Javascript's atob to decode base64 doesn't properly decode utf-8 strings

查看:1192
本文介绍了使用Javascript的atob来解码base64不能正确解码utf-8字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Javascript window.atob()函数来解码base64编码的字符串(具体来说,来自GitHub API的base64编码的内容)。问题是我正在获取ASCII编码字符(如â¢而不是)。如何正确处理传入的base64编码流,以便将其解码为utf-8?

I'm using the Javascript window.atob() function to decode a base64-encoded string (specifically the base64-encoded content from the GitHub API). Problem is I'm getting ASCII-encoded characters back (like ⢠instead of ). How can I properly handle the incoming base64-encoded stream so that it's decoded as utf-8?

推荐答案

有一个伟大的文章 Mozilla MDN,正好描述了这个问题:

There's a great article on Mozilla MDN that describes exactly this issue:


Unicode问题
由于DOMStrings是16位编码字符串,在大多数浏览器中,在Unicode字符串上调用window.btoa如果字符超过8位ASCII编码字符的范围,则会导致字符超出范围异常。有两种可能的方法来解决这个问题:

The "Unicode Problem" Since DOMStrings are 16-bit-encoded strings, in most browsers calling window.btoa on a Unicode string will cause a Character Out Of Range exception if a character exceeds the range of a 8-bit ASCII-encoded character. There are two possible methods to solve this problem:


  • 第一个是转义整个字符串,然后编码它;

  • 第二个是将UTF-16 DOMString转换为UTF-8字符数组,然后对其进行编码。

原始答案的注释:以前,MDN文章建议使用 unescape 逃避来解决 Character Out Of Range 异常问题,但是它们已被弃用。这里的一些其他答案建议使用 decodeURIComponent encodeURIComponent 来解决这个问题,这被证明是不可靠和不可预测的。 / em>

A note on the original answer: previously, the MDN article suggested using unescape and escape to solve the Character Out Of Range exception problem, but they have since been deprecated. Some other answers here have suggested working around this with decodeURIComponent and encodeURIComponent, this has proven to be unreliable and unpredictable.

最后,您可以通过使用图书馆来节省自己一些悲伤:

  • js-base64 (NPM, great for Node.js)
  • base64-js

以下是当前推荐,直接来自MDN:

Here is the the current recommendation, direct from MDN:

编码UTF8⇢base64 - 执行正则表达式代替不推荐使用的unescape函数

Encoding UTF8 ⇢ base64 — Implement a regular expression in place of the deprecated unescape function

function b64EncodeUnicode(str) {
    return btoa(encodeURIComponent(str).replace(/%([0-9A-F]{2})/g, function(match, p1) {
        return String.fromCharCode('0x' + p1);
    }));
}

b64EncodeUnicode('✓ à la mode'); // "4pyTIMOgIGxhIG1vZGU="
b64EncodeUnicode('\n'); // "Cg=="

解码base64⇢UTF8 - MDN文章最初没有一个解码的例子,但现在已经添加了一个

Decoding base64 ⇢ UTF8 — The MDN article didn't initially have an example for decoding, but one has now been added

function b64DecodeUnicode(str) {
    return decodeURIComponent(Array.prototype.map.call(atob(str), function(c) {
        return '%' + ('00' + c.charCodeAt(0).toString(16)).slice(-2);
    }).join(''));
}

b64DecodeUnicode('4pyTIMOgIGxhIG1vZGU='); // "✓ à la mode"
b64DecodeUnicode('Cg=='); // "\n"






原始解决方案,使用 escape unescape (现在已经不推荐使用,但这仍然适用于所有现代浏览器):


The original solution, using escape and unescape (which are now deprecated, though this still works in all modern browsers):

function utf8_to_b64( str ) {
    return window.btoa(unescape(encodeURIComponent( str )));
}

function b64_to_utf8( str ) {
    return decodeURIComponent(escape(window.atob( str )));
}

// Usage:
utf8_to_b64('✓ à la mode'); // "4pyTIMOgIGxhIG1vZGU="
b64_to_utf8('4pyTIMOgIGxhIG1vZGU='); // "✓ à la mode"






最后一件事情:我在调用GitHub API时遇到这个问题。为了让它正常工作(移动)Safari,我实际上必须从base64源之前剥离所有的空白,我甚至可以解码源:


And one last thing: I first encountered this problem when calling the GitHub API. To get this to work on (Mobile) Safari properly, I actually had to strip all white space from the base64 source before I could even decode the source:

function b64_to_utf8( str ) {
    str = str.replace(/\s/g, '');    
    return decodeURIComponent(escape(window.atob( str )));
}

这篇关于使用Javascript的atob来解码base64不能正确解码utf-8字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆