什么是默认的 JavaScript 字符编码? [英] What is the default JavaScript character encoding?

查看:39
本文介绍了什么是默认的 JavaScript 字符编码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在用 JavaScript 编写加密方法时,我开始想知道我的字符串使用什么字符编码,以及为什么.

While writing an encryption method in JavaScript, I came to wondering what character encoding my strings were using, and why.

是什么决定了 JavaScript 中的字符编码?是标准吗?通过浏览器?由HTTP请求的头部决定?在包含它的 HTML 的 标签中?提供页面的服务器?

What determines character encoding in JavaScript? Is it a standard? By the browser? Determined by the header of the HTTP request? In the <META> tag of HTML that encompasses it? The server that feeds the page?

根据我的经验测试(更改不同的设置,然后在一个足够奇怪的字符上使用 charCodeAt 并查看值与哪种编码匹配)它似乎总是 UTF-8 或 UTF-16,但我不确定为什么.

By my empirical testing (changing different settings, then using charCodeAt on a sufficiently strange character and seeing which encoding the value matches up with) it appears to always be UTF-8 or UTF-16, but I'm not sure why.

在疯狂的谷歌搜索之后,我似乎无法找到这个简单问题的决定性答案.

推荐答案

E262 的第 8.4 节:

Section 8.4 of E262:

String 类型是零个或多个 16 位无符号整数值(元素")的所有有限有序序列的集合.String 类型通常用于表示正在运行的 ECMAScript 程序中的文本数据,在这种情况下,String 中的每个元素都被视为一个代码单元值(参见第 6 条).每个元素都被视为在序列中占据一个位置.这些位置用非负整数索引.第一个元素(如果有)在位置 0,下一个元素(如果有)在位置 1,依此类推.字符串的长度是其中的元素数(即 16 位值).空字符串的长度为零,因此不包含任何元素.

The String type is the set of all finite ordered sequences of zero or more 16-bit unsigned integer values ("elements"). The String type is generally used to represent textual data in a running ECMAScript program, in which case each element in the String is treated as a code unit value (see Clause 6). Each element is regarded as occupying a position within the sequence. These positions are indexed with nonnegative integers. The first element (if any) is at position 0, the next element (if any) at position 1, and so on. The length of a String is the number of elements (i.e., 16-bit values) within it. The empty String has length zero and therefore contains no elements.

当一个字符串包含实际的文本数据时,每个元素都被认为是一个单一的 UTF-16 代码单元.无论这是否是字符串的实际存储格式,字符串中的字符都按其初始代码单元元素位置编号,就像使用 UTF-16 表示一样.对字符串的所有操作(除非另有说明)将它们视为未区分的 16 位无符号整数序列;它们不确保生成的 String 为规范化形式,也不确保对语言敏感的结果.

When a String contains actual textual data, each element is considered to be a single UTF-16 code unit. Whether or not this is the actual storage format of a String, the characters within a String are numbered by their initial code unit element position as though they were represented using UTF-16. All operations on Strings (except as otherwise stated) treat them as sequences of undifferentiated 16-bit unsigned integers; they do not ensure the resulting String is in normalised form, nor do they ensure language-sensitive results.

这种措辞有点狡猾;这似乎意味着所有重要的东西都将字符串视为每个字符都是 UTF-16 字符,但同时没有任何东西可以确保它都是有效的.

That wording is kind of weaselly; it seems to mean that everything that counts treats strings as if each character is a UTF-16 character, but at the same time nothing ensures that it'll all be valid.

明确地说,意图是字符串由 UTF-16 代码点组成.在 ES2015 中,字符串值"的定义如下:包括此注释:

To be clear, the intention is that strings consist of UTF-16 code points. In ES2015, the definition of "string value" includes this note:

字符串值是字符串类型的成员.序列中的每个整数值通常表示 UTF-16 文本的单个 16 位单元.但是,ECMAScript 对这些值没有任何限制或要求,只是它们必须是 16 位无符号整数.

A String value is a member of the String type. Each integer value in the sequence usually represents a single 16-bit unit of UTF-16 text. However, ECMAScript does not place any restrictions or requirements on the values except that they must be 16-bit unsigned integers.

因此,即使字符串包含的值不能作为正确的 Unicode 字符使用,它仍然是字符串.

So a string is still a string even when it contains values that don't work as correct Unicode characters.

这篇关于什么是默认的 JavaScript 字符编码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆