JavaScript字符串 - UTF-16与UCS-2? [英] JavaScript strings - UTF-16 vs UCS-2?

查看:150
本文介绍了JavaScript字符串 - UTF-16与UCS-2?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在某些地方读过JavaScript字符串是UTF-16,而在其他地方它们是UCS-2。我做了一些搜索,试图找出差异,发现这个:

I've read in some places that JavaScript strings are UTF-16, and in other places they're UCS-2. I did some searching around to try to figure out the difference and found this:


问:UCS-2和UTF有什么区别-16?

Q: What is the difference between UCS-2 and UTF-16?

答:UCS-2是过时的术语,在代理代码点和
之前引用Unicode
实现,直到Unicode 1.1 UTF-16被添加到该标准的2.0版本中。这个术语现在应该避免

A: UCS-2 is obsolete terminology which refers to a Unicode implementation up to Unicode 1.1, before surrogate code points and UTF-16 were added to Version 2.0 of the standard. This term should now be avoided.

UCS-2没有定义不同的数据格式,因为UTF-16和UCS-2
是相同的数据交换目的。两者都是16位,并且
完全相同的代码单位表示。

UCS-2 does not define a distinct data format, because UTF-16 and UCS-2 are identical for purposes of data exchange. Both are 16-bit, and have exactly the same code unit representation.

有时在过去,实现被标记为UCS-2到
表示它不支持补充字符,并且
不会将代理代码点对解释为字符。这样的
实现不会处理字符属性的处理,
代码点边界,校对等补充字符。

Sometimes in the past an implementation has been labeled "UCS-2" to indicate that it does not support supplementary characters and doesn't interpret pairs of surrogate code points as characters. Such an implementation would not handle processing of character properties, code point boundaries, collation, etc. for supplementary characters.

via: http://www.unicode.org/faq/utf_bom .html#utf16-11

所以我的问题是,是因为JavaScript字符串对象的方法和索引作用于16位数据值而不是字符是什么让一些人认为它是UCS-2?如果是这样,围绕字符而不是16位数据块的JavaScript字符串对象是否会被视为UTF-16?或者还有其他我缺少的东西?

So my question is, is it because the JavaScript string object's methods and indexes act on 16-bit data values instead of characters what make some people consider it UCS-2? And if so, would a JavaScript string object oriented around characters instead of 16-bit data chunks be considered UTF-16? Or is there something else I'm missing?

编辑:根据要求,这里有一些消息来源说JavaScript字符串是UCS-2:

As requested, here are some sources saying JavaScript strings are UCS-2:

http://blog.mozilla .com / nnethercote / 2011/07/01 / faster-javascript-parsing /
http://terenceyim.wordpress.com/tag/ucs2/

编辑:对于任何可能遇到此问题的人,请务必查看以下链接:

EDIT: For anyone who may come across this, be sure to check out this link:

http ://mathiasbynens.be/notes/javascript-encoding

推荐答案

JavaScript,严格来说,ECMAScript,预日期Unicode 2.0,因此在某些情况下,您可能会发现对UCS-2的引用只是因为在编写引用时这是正确的。你能否指出我们将JavaScript作为UCS-2的特定引用?

JavaScript, strictly speaking, ECMAScript, pre-dates Unicode 2.0, so in some cases you may find references to UCS-2 simply because that was correct at the time the reference was written. Can you point us to specific citations of JavaScript being "UCS-2"?

ECMAScript版本3和5的规范至少都明确声明一个String是一个无符号的集合16位整数和 if 这些整数值用于表示文本数据,然后它们是UTF-16代码单元。请参阅 ECMAScript语言规范的第8.4节。

Specifications for ECMAScript versions 3 and 5 at least both explicitly declare a String to be a collection unsigned 16-bit integers and that if those integer values are meant to represent textual data, then they are UTF-16 code units. See section 8.4 of the ECMAScript Language Specification.

编辑:我不再确定我的答案是完全正确的。请参阅上面提到的优秀文章 http://mathiasbynens.be/notes/javascript-encoding ,从本质上讲,虽然JavaScript引擎可能在内部使用UTF-16,而大多数情况下,语言本身会有效地将这些字符暴露为UCS-2。

EDIT: I'm no longer sure my answer is entirely correct. See the excellent article mentioned above, http://mathiasbynens.be/notes/javascript-encoding, which in essence says that while a JavaScript engine may use UTF-16 internally, and most do, the language itself effectively exposes those characters as if they were UCS-2.

这篇关于JavaScript字符串 - UTF-16与UCS-2?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆