JavaScript中字符串长度(以字节为单位 [英] String length in bytes in JavaScript

查看:114
本文介绍了JavaScript中字符串长度(以字节为单位的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我的JavaScript代码中,我需要以这种格式向服务器撰写邮件:

In my JavaScript code I need to compose a message to server in this format:

<size in bytes>CRLF
<data>CRLF

示例:

3
foo

数据可能包含unicode字符。我需要将它们作为UTF-8发送。

The data may contain unicode characters. I need to send them as UTF-8.

我正在寻找最常用的跨浏览器方式来计算JavaScript中字符串的长度。

I'm looking for the most cross-browser way to calculate the length of the string in bytes in JavaScript.

我试过这个来组成我的有效载荷:

I've tried this to compose my payload:

return unescape(encodeURIComponent(str)).length + "\n" + str + "\n"

但它没有给我准确的旧浏览器结果(或者,这些浏览器中的字符串是UTF-16吗?)。

But it does not give me accurate results for the older browsers (or, maybe the strings in those browsers in UTF-16?).

任何线索?

更新:

示例:字符串的长度(字节)ЭЭХ ! UTF-8中的Naïve?是15个字节,但有些浏览器会报告23个字节。

Example: length in bytes of the string ЭЭХ! Naïve? in UTF-8 is 15 bytes, but some browsers report 23 bytes instead.

推荐答案

没有办法在本地使用JavaScript。

There is no way to do it in JavaScript natively.

如果你知道字符编码,你可以自己计算它。

If you know the character encoding, you can calculate it yourself though.

encodeURIComponent 假定UTF-8为字符编码,因此如果您需要该编码,则可以这样做,

encodeURIComponent assumes UTF-8 as the character encoding, so if you need that encoding, you can do,

function lengthInUtf8Bytes(str) {
  // Matches only the 10.. bytes that are non-initial characters in a multi-byte sequence.
  var m = encodeURIComponent(str).match(/%[89ABab]/g);
  return str.length + (m ? m.length : 0);
}

这应该可行,因为UTF-8编码多字节序列的方式。对于单字节序列,第一个编码字节始终以高位0开始,或者以第一个十六进制数字为C,D,E或F的字节开始。第二个和后续字节是前两个字节为10的字节那些是你想用UTF-8计算的额外字节。

This should work because of the way UTF-8 encodes multi-byte sequences. The first encoded byte always starts with either a high bit of zero for a single byte sequence, or a byte whose first hex digit is C, D, E, or F. The second and subsequent bytes are the ones whose first two bits are 10. Those are the extra bytes you want to count in UTF-8.

维基百科更清晰

Bits        Last code point Byte 1          Byte 2          Byte 3
  7         U+007F          0xxxxxxx
 11         U+07FF          110xxxxx        10xxxxxx
 16         U+FFFF          1110xxxx        10xxxxxx        10xxxxxx
...

如果你需要了解页面编码,你可以使用这个技巧:

If instead you need to understand the page encoding, you can use this trick:

function lengthInPageEncoding(s) {
  var a = document.createElement('A');
  a.href = '#' + s;
  var sEncoded = a.href;
  sEncoded = sEncoded.substring(sEncoded.indexOf('#') + 1);
  var m = sEncoded.match(/%[0-9a-f]{2}/g);
  return sEncoded.length - (m ? m.length * 2 : 0);
}

这篇关于JavaScript中字符串长度(以字节为单位的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆