ECMAScript / JavaScript字符串中的每个字符占用多少RAM? [英] How much RAM does each character in ECMAScript/JavaScript string consume?

查看:102
本文介绍了ECMAScript / JavaScript字符串中的每个字符占用多少RAM?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题很简单:ECMAScript / JavaScript字符串中的每个字符占用多少RAM(以字节为单位)?

The question is pretty simple: how much RAM in bytes does each character in an ECMAScript/JavaScript string consume?

我要猜两个字节,因为标准表示它们存储为16位无符号整数?

I am going to guess two bytes, since the standard says they are stored as 16-bit unsigned integers?

这是否意味着每个字符总是两个字节?

Does this mean each character is always two bytes?

推荐答案

是的,我相信情况就是这样。字符可能存储为宽字符串或UCS2字符串。
它们可能是UTF-16,在这种情况下,它们为BMP(基本多语言平面)之外的字符每个字符占用两个字(16位整数),但我相信这些字符不是完全支持的。在ECMA的UTF16实施中阅读此关于问题的博客文章

Yes, I believe that is the case. The characters are probably stored as widestrings or UCS2-strings. They may be UTF-16, in which case they take up two Words (16 bit integers) per character for characters outside the BMP (Basic Multilingual Plane), but I believe these characters are not fully supported. Read This blog post about problems in the UTF16 implementation of ECMA.

大多数现代语言使用两个字节字符存储其字符串。这样您就可以完全支持所有口语。它需要一点额外的内存,但对于任何具有multiGig RAM的现代计算机来说都是花生。将字符串存储在更紧凑的UTF8中将导致处理更复杂和更慢。因此,UTF8主要用于运输。 ASCII仅支持没有变音符号的拉丁字母。 ANSI仍然有限,需要指定的代码页才有意义。

Most modern languages store their strings with two byte characters. This way you have full support for all spoken languages. It costs a little extra memory, but that's peanuts for any modern computer with multiGig RAM. Storing the string in more compact UTF8 will cause processing to be more complex and slower. UTF8 is therefore mostly used for transportation only. ASCII supports only Latin alphabet without diacritics. ANSI is still limited and needs a specified code page to make sense.

ECMA-262 明确地将字符串值定义为原始值,它是零个或多个16位无符号整数的有限有序序列 。 建议程序使用这些16位值作为UTF-16文本,但使用字符串存储任何不可变的unsigned short数组是合法的。

Section 4.13.16 of ECMA-262 explicitly defines "String value" as a "primitive value that is a finite ordered sequence of zero or more 16-bit unsigned integers". It suggests that programs use these 16-bit values as UTF-16 text, but it is legal simply to use a string to store any immutable array of unsigned shorts.

请注意,字符大小不是构成字符串大小的唯一因素。我不知道确切的实现(它可能会有所不同),但字符串往往有一个0x00终结符,使它们与PChars兼容。他们可能有一些包含字符串大小的标题,也许还有一些引用计数甚至是编码信息。一个字符的字符串很容易消耗10个字节或更多(是的,那是80位)。

Note that character size isn't the only thing that makes up the string size. I don't know about the exact implementation (and it might differ), but strings tend to have a 0x00 terminator to make them compatible with PChars. And they probably have some header that contains the string size and maybe some refcounting and even encoding information. A string with one character can easily consume 10 bytes or more (yes, that's 80 bits).

这篇关于ECMAScript / JavaScript字符串中的每个字符占用多少RAM?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆