使用JavaScript解码URL编码的Windows-1251(cp1251)字符串 [英] Decoding a url-encoded windows-1251 (cp1251) string with JavaScript

查看:125
本文介绍了使用JavaScript解码URL编码的Windows-1251(cp1251)字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

不幸的是,我遇到了一个问题,我没有找到正确的解决方案:我需要解码使用Windows-1251(cp1251)编码的url-slice.

I have faced a problem, unfortunately, I have not found a correct solution: I need to decode url-slice that is encoded with windows-1251 (cp1251).

我知道有这些方法- decodeURI() decodeURIComponent(),但是它们仅适用于UTF-8(据我所知).我找到的解决方案使用了不赞成使用的方法escape()和unescape().

I know there are theese methods - decodeURI() and decodeURIComponent(), but they work for UTF-8 only (as I have understood). A solution that I found uses deprecated methods escape() and unescape().

例如,有序列:

%EF%F0%EE%E3%F0%E0%EC%EC%E8%F0%EE%E2%E0%ED%E8%E5 (программирование)

%EF%F0%EE%E3%F0%E0%EC%EC%E8%F0%EE%E2%E0%ED%E8%E5 (программирование)

decodeURI()和decodeURIComponent()方法将导致异常.

The methods decodeURI() and decodeURIComponent() will cause an exception.

将非常感谢您的帮助.

推荐答案

浏览器中没有对带有旧字符集的百分比编码方案的内置支持.您必须:

There's no built-in support for the percent-encoding scheme with legacy charsets in the browser, as far as I can see. You'll have to:

  1. 找到代表win-1251八位字节的%换码符,
  2. 将win-1251八位字节解码为相应的字符(JS String)

以下是完成此操作的一种方法.对于#1,我假设只有3个字符的大写转义字符需要解码,并且字符串的其余部分已经是ASCII,因此我只为此使用inputStr.replace(/%([0-9A-Z]{2})/g, replacerFunction )

Below is one way to do it. For the #1 I assume that only 3-character upper-case escapes need decoding, and the rest of the string is already ASCII, so I just use inputStr.replace(/%([0-9A-Z]{2})/g,replacerFunction) for this.

对于实际解码,您需要一个从win-1251八位位组到JS字符的映射.在下面的示例中,我使用 TextDecoder.decode()构建映射API ,只是为了好玩(以防万一有人试图在JS中的不同字符集之间进行转换时找到此答案). (注意:目前尚不普遍支持它-只有Gecko/Blink支持它.)

For the actual decoding you need a mapping from the win-1251 octets to JS characters. In the example below I build the mapping using TextDecoder.decode() API, just for fun (and in case someone finds this answer while trying to convert between different charsets in JS). (Note: it isn't universally supported as of this time -- only Gecko/Blink support it).

还有 https://github.com/mathiasbynens/windows-1251 ,我最初想使用此答案,但事实证明,手动构建解码图会更容易.

There's also https://github.com/mathiasbynens/windows-1251 , which I initially wanted to use for this answer, but it turned out to be easier to just build the decoding map by hand.

var decodeMap = {};
var win1251 = new TextDecoder("windows-1251");
for (var i = 0x00; i < 0xFF; i++) {
  var hex = (i <= 0x0F ? "0" : "") +      // zero-padded
            i.toString(16).toUpperCase();
  decodeMap[hex] = win1251.decode(Uint8Array.from([i]));
}
// console.log(decodeMap);
// {"10":"\u0010", ... "40":"@","41":"A","42":"B", ... "C0":"А","C1":"Б", ...


// Decodes a windows-1251 encoded string, additionally
// encoded as an ASCII string where each non-ASCII character of the original
// windows-1251 string is encoded as %XY where XY (uppercase!) is a
// hexadecimal representation of that character's code in windows-1251.
function percentEncodedWin1251ToDOMString(str) {
  return str.replace(/%([0-9A-F]{2})/g,
    (match, hex) => decodeMap[hex]);
}

console.log(percentEncodedWin1251ToDOMString("%EF%F0%EE%E3%F0%E0%EC%EC%!%E8%F0%EE%E2%E0%ED%E8%E5a"))

这篇关于使用JavaScript解码URL编码的Windows-1251(cp1251)字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆