默认Javascript字符编码? [英] Default Javascript Character Encoding?

查看:156
本文介绍了默认Javascript字符编码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

经过一些疯狂的Google搜索,我似乎找不到一个简单的问题的结论性答案。

After some frantic Googling, I can't seem to find a conclusive answer to a simple question. I apologize if this is question is answered somewhere, but if so I couldn't find it.

在Javascript中编写加密方法时,我想知道什么是字符编码我的字符串正在使用,为什么。

While writing an encryption method in Javascript, I came to wondering what character encoding my strings were using, and why.

那么:什么决定了Javascript中的字符编码?是标准吗?由浏览器?由HTTP请求的头确定?在包含它的HTML的< META> 标记中?提供页面的服务器?

So: what determines character encoding in Javascript? Is it a standard? By the browser? Determined by the header of the HTTP request? In the <META> tag of HTML that encompasses it? The server that feeds the page?

通过我的经验测试(改变不同的设置,然后使用 charCodeAt 奇怪的字符,并看到该值匹配的编码),它似乎总是UTF-8或UTF-16,但我不确定为什么

By my empirical testing (changing different settings, then using charCodeAt on a sufficiently strange character and seeing which encoding the value matches up with) it appears to always be UTF-8 or UTF-16, but I'm not sure why.

感谢您的帮助!

推荐答案

E262的第8.4节:

Section 8.4 of E262:


String类型是零个或多个16位无符号整数值(元素)的所有有限有序序列的集合。 String类型通常用于在运行的ECMAScript程序中表示文本数据,在这种情况下,字符串中的每个元素都被视为一个代码单位值(见第6章)。每个元素被认为占据序列内的位置。这些位置用非负整数索引。第一个元素(如果有)在位置0,下一个元素(如果有)在位置1,依此类推。 String的长度是其中的元素(即,16位值)的数量。空字符串的长度为零,因此不包含任何元素。

The String type is the set of all finite ordered sequences of zero or more 16-bit unsigned integer values ("elements"). The String type is generally used to represent textual data in a running ECMAScript program, in which case each element in the String is treated as a code unit value (see Clause 6). Each element is regarded as occupying a position within the sequence. These positions are indexed with nonnegative integers. The first element (if any) is at position 0, the next element (if any) at position 1, and so on. The length of a String is the number of elements (i.e., 16-bit values) within it. The empty String has length zero and therefore contains no elements.

当字符串包含实际文本数据时,每个元素都被认为是单个UTF-16代码单元。无论这是否是字符串的实际存储格式,字符串中的字符都由它们的初始代码单元元素位置编号,就好像使用UTF-16表示。字符串上的所有操作(除非另有说明)将它们视为未分化的16位无符号整数序列;它们不能确保生成的String是正规化的形式,也不会确保对语言敏感的结果。

When a String contains actual textual data, each element is considered to be a single UTF-16 code unit. Whether or not this is the actual storage format of a String, the characters within a String are numbered by their initial code unit element position as though they were represented using UTF-16. All operations on Strings (except as otherwise stated) treat them as sequences of undifferentiated 16-bit unsigned integers; they do not ensure the resulting String is in normalised form, nor do they ensure language-sensitive results.

;它似乎意味着所有计数的对待字符串,如果每个字符是一个UTF-16字符,但同时没有什么可以确保它都是有效的。

That wording is kind-of weasely; it seems to mean that everything that counts treats strings as if each character is a UTF-16 character, but at the same time nothing ensures that it'll all be valid.

编辑—为了清楚,意图是字符串由UTF-16编码点组成。在ES2015中,字符串值的定义包括此注释:

edit — to be clear, the intention is that strings consist of UTF-16 codepoints. In ES2015, the definition of "string value" includes this note:


字符串值是String类型的成员。序列中的每个整数值通常表示一个16位的UTF-16文本单元。但是,ECMAScript不对值进行任何限制或要求,除非它们必须是16位无符号整数。

A String value is a member of the String type. Each integer value in the sequence usually represents a single 16-bit unit of UTF-16 text. However, ECMAScript does not place any restrictions or requirements on the values except that they must be 16-bit unsigned integers.

仍然是一个字符串,即使它包含不能作为正确的unicode字符工作的值。

So a string is still a string even when it contains values that don't work as correct unicode characters.

这篇关于默认Javascript字符编码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆