javascript和字符串操作w / utf-16代理对 [英] javascript and string manipulation w/ utf-16 surrogate pairs

查看:151
本文介绍了javascript和字符串操作w / utf-16代理对的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

正在使用Twitter应用程序,只是偶然发现了utf-8(16)的世界。似乎大多数javascript字符串函数对代理对都是盲目的。我必须重新编码一些东西以使它具有广泛的字符识别能力。

I'm working on a twitter app and just stumbled into the world of utf-8(16). It seems the majority of javascript string functions are as blind to surrogate pairs as I was. I've got to recode some stuff to make it wide character aware.

我有这个函数来解析字符串到数组中,同时保留代理对。然后我将重新编码几个函数来处理数组而不是字符串。

I've got this function to parse strings into arrays while preserving the surrogate pairs. Then I'll recode several functions to deal with the arrays rather than strings.

function sortSurrogates(str){
  var cp = [];                 // array to hold code points
  while(str.length){           // loop till we've done the whole string
    if(/[\uD800-\uDFFF]/.test(str.substr(0,1))){ // test the first character
                               // High surrogate found low surrogate follows
      cp.push(str.substr(0,2)); // push the two onto array
      str = str.substr(2);     // clip the two off the string
    }else{                     // else BMP code point
      cp.push(str.substr(0,1)); // push one onto array
      str = str.substr(1);     // clip one from string 
    }
  }                            // loop
  return cp;                   // return the array
}

我的问题是,是否有更简单的东西我是失踪?我看到很多人重申javascript本身处理utf-16,但我的测试让我相信,这可能是数据格式,但功能还不知道。我错过了一些简单的东西吗?

My question is, is there something simpler I'm missing? I see so many people reiterating that javascript deals with utf-16 natively, yet my testing leads me to believe, that may be the data format, but the functions don't know it yet. Am I missing something simple?

编辑:
为了帮助说明问题:

To help illustrate the issue:

这篇关于javascript和字符串操作w / utf-16代理对的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆