javascript和字符串操作w / utf-16代理对 [英] javascript and string manipulation w/ utf-16 surrogate pairs
问题描述
我正在使用Twitter应用程序,只是偶然发现了utf-8(16)的世界。似乎大多数javascript字符串函数对代理对都是盲目的。我必须重新编码一些东西以使它具有广泛的字符识别能力。
I'm working on a twitter app and just stumbled into the world of utf-8(16). It seems the majority of javascript string functions are as blind to surrogate pairs as I was. I've got to recode some stuff to make it wide character aware.
我有这个函数来解析字符串到数组中,同时保留代理对。然后我将重新编码几个函数来处理数组而不是字符串。
I've got this function to parse strings into arrays while preserving the surrogate pairs. Then I'll recode several functions to deal with the arrays rather than strings.
function sortSurrogates(str){
var cp = []; // array to hold code points
while(str.length){ // loop till we've done the whole string
if(/[\uD800-\uDFFF]/.test(str.substr(0,1))){ // test the first character
// High surrogate found low surrogate follows
cp.push(str.substr(0,2)); // push the two onto array
str = str.substr(2); // clip the two off the string
}else{ // else BMP code point
cp.push(str.substr(0,1)); // push one onto array
str = str.substr(1); // clip one from string
}
} // loop
return cp; // return the array
}
我的问题是,是否有更简单的东西我是失踪?我看到很多人重申javascript本身处理utf-16,但我的测试让我相信,这可能是数据格式,但功能还不知道。我错过了一些简单的东西吗?
My question is, is there something simpler I'm missing? I see so many people reiterating that javascript deals with utf-16 natively, yet my testing leads me to believe, that may be the data format, but the functions don't know it yet. Am I missing something simple?
编辑:
为了帮助说明问题:
To help illustrate the issue:
这篇关于javascript和字符串操作w / utf-16代理对的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!