WebSockets 和文本编码 [英] WebSockets and text encoding

查看:23
本文介绍了WebSockets 和文本编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我读过:

WebSocket API 接受一个 DOMString 对象,它被编码为网络上的 UTF-8,或 ArrayBuffer、ArrayBufferView 或 Blob 之一用于二进制传输的对象.

The WebSocket API accepts a DOMString object, which is encoded as UTF-8 on the wire, or one of ArrayBuffer, ArrayBufferView, or Blob objects for binary transfers.

A DOMString 是一个 UTF-16 编码的字符串.那么通过网络使用 UTF-8 编码是否正确?

A DOMString is a UTF-16 encoded string. So is it correct that UTF-8 encoding is used over the wire?

推荐答案

是的,是正确的.

UTF-16 可能会或可能不会在内存中使用,这只是您使用的任何框架的实现细节.对于 JavaScript,字符串是 UTF-16.

Yes, it is correct.

UTF-16 may or may not be used in memory, that is just an implementation detail of whatever framework you are using. In the case of JavaScript, strings are UTF-16.

对于 WebSocket 通信,文本数据必须通过网络使用 UTF-8(现在大多数 Internet 协议都使用 UTF-8).这是由 WebSocket 协议规范规定的:

For WebSocket communications, UTF-8 must be used over the wire for textual data (most Internet protocols use UTF-8 nowadays). That is dictated by the WebSocket protocol specification:

成功握手后,客户端和服务器以本规范中称为消息"的概念单元来回传输数据.在网络上,一条消息由一个或多个帧组成.WebSocket 消息不一定对应于特定的网络层帧,因为碎片消息可能会被中间人合并或拆分.

After a successful handshake, clients and servers transfer data back and forth in conceptual units referred to in this specification as "messages". On the wire, a message is composed of one or more frames. The WebSocket message does not necessarily correspond to a particular network layer framing, as a fragmented message may be coalesced or split by an intermediary.

一个框架有一个关联的类型.属于同一消息的每个帧都包含相同类型的数据.从广义上讲,有文本数据类型(被解释为 UTF-8 [RFC3629] 文本)、二进制数据(其解释由应用程序决定)和控制帧(不是旨在为应用程序携带数据,而不是用于协议级信令,例如表示应该关闭连接).该版本的协议定义了六种帧类型,并保留了十种以备将来使用.

A frame has an associated type. Each frame belonging to the same message contains the same type of data. Broadly speaking, there are types for textual data (which is interpreted as UTF-8 [RFC3629] text), binary data (whose interpretation is left up to the application), and control frames (which are not intended to carry data for the application but instead for protocol-level signaling, such as to signal that the connection should be closed). This version of the protocol defines six frame types and leaves ten reserved for future use.

...

数据帧(例如,非控制帧)由操作码标识,其中操作码的最高有效位为 0.目前为数据帧定义的操作码包括 0x1(文本)、0x2(二进制)).操作码 0x3-0x7 保留用于其他尚未定义的非控制帧.

Data frames (e.g., non-control frames) are identified by opcodes where the most significant bit of the opcode is 0. Currently defined opcodes for data frames include 0x1 (Text), 0x2 (Binary). Opcodes 0x3-0x7 are reserved for further non-control frames yet to be defined.

数据帧承载应用层和/或扩展层数据.操作码决定了数据的解释:

Data frames carry application-layer and/or extension-layer data. The opcode determines the interpretation of the data:

文字

有效载荷数据"是编码为 UTF-8 的文本数据.请注意,特定文本框架可能包含部分 UTF-8 序列;但是,整个消息必须包含有效的 UTF-8.重组消息中的无效 UTF-8 将按照第 8.1 节所述进行处理.

The "Payload data" is text data encoded as UTF-8. Note that a particular text frame might include a partial UTF-8 sequence; however, the whole message MUST contain valid UTF-8. Invalid UTF-8 in reassembled messages is handled as described in Section 8.1.

二进制

有效载荷数据"是任意二进制数据,其解释完全取决于应用层.

The "Payload data" is arbitrary binary data whose interpretation is solely up to the application layer.

从 UTF-16 到 UTF-8 再到 UTF-16 的转换会产生少量开销,但在现代机器上开销很小,并且 UTF 之间的转换是无损的.

You will incure a small amount of overhead converting from UTF-16 to UTF-8 to UTF-16, but the overhead is minimal on modern machines, and conversions between UTFs are lossless.

这篇关于WebSockets 和文本编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆