如何使用javascript将特殊UTF-8字符转换为iso-8859-1等效字符? [英] How do I convert special UTF-8 chars to their iso-8859-1 equivalent using javascript?

查看:529
本文介绍了如何使用javascript将特殊UTF-8字符转换为iso-8859-1等效字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在制作一个javascript应用程序,用jquery检索.json文件,并将数据插入到嵌入的网页中。

I'm making a javascript app which retrieves .json files with jquery and injects data into the webpage it is embedded in.

.json文件UTF-8,并包含重音字符如é,ö和å。

The .json files are encoded with UTF-8 and contains accented chars like é, ö and å.

问题是,我不控制要使用应用程序的页面上的字符集。

The problem is that I don't control the charset on the pages that are going to use the app.

一些将使用UTF-8,但其他人将使用iso-8859-1字符集。这将当然会使.json文件中的特殊字符加粗。

Some will be using UTF-8, but others will be using the iso-8859-1 charset. This will of course garble the special chars from the .json files.

如何使用javascript将特殊UTF-8字符转换为iso-8859-1等效字符?

How do I convert special UTF-8 chars to their iso-8859-1 equivalent using javascript?

推荐答案

实际上,所有内容通常都以Unicode格式存储在内部,但不允许使用。我假设你得到标志性的åäö类型字符串,因为你使用ISO-8859作为你的字符编码。有一个窍门,你可以做转换这些字符。为ISO字符定义用于编码和解码查询字符串的 escape unescape 函数,而较新的 encodeURIComponent decodeURIComponent ,它们都是为UTF8字符定义的。

Actually, everything is typically stored as Unicode of some kind internally, but lets not go into that. I'm assuming you're getting the iconic "åäö" type strings because you're using a ISO-8859 as your character encoding. There's a trick you can do to convert those characters. The escape and unescape functions used for encoding and decoding query strings are defined for ISO characters whereas the newer encodeURIComponent and decodeURIComponent which do the same thing, are defined for UTF8 characters.

escape 将扩展ISO-8859-1字符(UTF代码点U + 0080-U + 00ff) %xx (两位十六进制),而它编码UTF码点U + 0100及以上为%uxxxx c $ c>%u 后跟四位十​​六进制。)例如, escape(å)==%E5 escape(あ)==%u3042

escape encodes extended ISO-8859-1 characters (UTF code points U+0080-U+00ff) as %xx (two-digit hex) whereas it encodes UTF codepoints U+0100 and above as %uxxxx (%u followed by four-digit hex.) For example, escape("å") == "%E5" and escape("あ") == "%u3042".

encodeURIComponent percent-encoded将扩展字符编码为UTF8字节序列。例如, encodeURIComponent(å)==%C3%A5 encodeURIComponent(あ)==%E3%81 %82

因此您可以:

fixedstring = decodeURIComponent(escape(utfstring));

例如,错误编码的字符å变为Ã¥。命令 escape(Ã¥)==%C3%A5这是编码为单字节的两个不正确的ISO字符。然后 decodeURIComponent(%C3%A5)==å,其中两个百分比编码的字节被解释为UTF8序列。

For example, an incorrectly encoded character "å" becomes "Ã¥". The command does escape("Ã¥") == "%C3%A5" which is the two incorrect ISO characters encoded as single bytes. Then decodeURIComponent("%C3%A5") == "å", where the two percent-encoded bytes are being interpreted as a UTF8 sequence.

如果你因为某种原因需要做相反的事情,那也是有效的:

If you'd need to do the reverse for some reason, that works too:

utfstring = unescape(encodeURIComponent(originalstring));

有没有办法区分坏的UTF8字符串和ISO字符串?原来有。如果给定一个格式错误的编码序列,上面使用的decodeURIComponent函数将抛出一个错误。我们可以使用这个很大的概率检测我们的字符串是UTF8还是ISO。

Is there a way a way to differentiate between bad UTF8 strings and and ISO strings? Turns out there is. The decodeURIComponent function used above will throw an error if given a malformed encoded sequence. We can use this to detect with a great probability whether our string is UTF8 or ISO.

var fixedstring;

try{
    // If the string is UTF-8, this will work and not throw an error.
    fixedstring=decodeURIComponent(escape(badstring));
}catch(e){
    // If it isn't, an error will be thrown, and we can asume that we have an ISO string.
    fixedstring=badstring;
}

这篇关于如何使用javascript将特殊UTF-8字符转换为iso-8859-1等效字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆