是否有可能转换包含“高”字符串的字符串? unicode字符转换为由从utf-32(“真”)代码导出的dec值组成的数组? [英] Is it possible to convert a string containing "high" unicode chars to an array consisting of dec values derived from utf-32 ("real") codes?

查看:112
本文介绍了是否有可能转换包含“高”字符串的字符串? unicode字符转换为由从utf-32(“真”)代码导出的dec值组成的数组?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

请看这个以(理论上可能)字符串操作的脚本:

 <!doctype html> 
< html>
< head>
< meta charset =utf-8>
< title>< / title>
< script src =jquery.js>< / script>
< script>
$(function(){
$(#click)。click(function(){
var txt = $('#high-unicode')。text();
var codes ='';
for(var i = 0; i< txt.length; i ++){
if(i> 0)codes + =',';
codes + = txt.charCodeAt(i);
}
alert(codes);
});
});
< / script>
< / head>
< body>
< span id =high-unicode>&#x1D465;<! - 数学斜体小x - &#xF31E0;<! - 来自辅助私人使用的一些字符A-> A<! - 字符A - >&#x108171;<! - 来自补充专用区域B的一些字符 - >< / span>
< / body>
< / html>

取代55349,56421,56204,56800,65,56288,56689,是否有可能得到119909,995808,65,1081713?我已阅读 more-utf-32-aware -javascript-string 问:什么是从UTF-16转换为字符代码的算法? + 问:是否有更简单的方法来完成此操作? from unicode.org/faq/utf_bom ,但我不确定如何使用这些信息。

解决方案

看起来你必须解码代理对。例如:

$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $函数decodeUnicode(str){
var r = [],i = 0;
while(i< str.length){
var chr = str.charCodeAt(i ++);
if(chr> = 0xD800&& chr< = 0xDBFF){
//代理对
var low = str.charCodeAt(i ++);
r.push(0x10000 +((chr - 0xD800)<< 10)|(low - 0xDC00));
} else {
//普通字符
r.push(chr);
}
}
return r;
}

完整代码: http://jsfiddle.net/twQWU/


Please, look at this script operating on a (theoretically possible) string:

<!doctype html>
<html>
<head>
<meta charset="utf-8">
<title></title>
<script src="jquery.js"></script>
<script>
    $(function () {
        $("#click").click(function () {
            var txt = $('#high-unicode').text();
            var codes = '';
            for (var i = 0; i < txt.length; i++) {
                if (i > 0) codes += ',';
                codes += txt.charCodeAt(i);
            }
            alert(codes);
        });
    });
</script>
</head>
<body>
<span id="click">click</span><br />
<span id="high-unicode">&#x1D465;<!-- mathematical italic small x -->&#xF31E0;<!-- some char from Supplementary Private Use Area-A -->A<!-- char A -->&#x108171;<!-- some char from Supplementary Private Use Area-B --></span>
</body>
</html>

Instead of "55349,56421,56204,56800,65,56288,56689", is it possible to get "119909,995808,65,1081713"? I've read more-utf-32-aware-javascript-string and Q: What’s the algorithm to convert from UTF-16 to character codes? + Q: Isn’t there a simpler way to do this? from unicode.org/faq/utf_bom, but I'm not sure how to use this info.

解决方案

It looks like you have to decode surrogate pairs manually. For example:

function decodeUnicode(str) {
    var r = [], i = 0;
    while(i < str.length) {
        var chr = str.charCodeAt(i++);
        if(chr >= 0xD800 && chr <= 0xDBFF) {
            // surrogate pair
            var low = str.charCodeAt(i++);
            r.push(0x10000 + ((chr - 0xD800) << 10) | (low - 0xDC00));
        } else {
            // ordinary character
            r.push(chr);
        }
    }
    return r;
}

Complete code: http://jsfiddle.net/twQWU/

这篇关于是否有可能转换包含“高”字符串的字符串? unicode字符转换为由从utf-32(“真”)代码导出的dec值组成的数组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆