以UTF-8字符串存储二进制数据 [英] Storing binary data in UTF-8 string
问题描述
我想使用WebSocket传输二进制数据,但您只能使用WebSockets传输UTF-8字符串。
I want to use a WebSocket to transfer binary data, but you can only use WebSockets to transfer UTF-8 strings.
使用base64对其进行编码是一种选择,但我的理解是,当您的文本可能从一种格式转换为另一种格式时,最理想的是base64。在这种情况下,我知道数据将始终是UTF-8,那么有没有更好的方法来编码UTF-8字符串中的二进制数据而不支付base64的33%大小溢价?
Encoding it using base64 is one option, but my understanding is that base64 is most desirable when your text might be converted from one format to another. In this case, I know the data will always be UTF-8, so is there a better way of encoding binary data in a UTF-8 string without paying base64's 33% size premium?
这个问题主要是学术性的,因为二进制支持最终可能会被添加到WebSocket中,而base64在此期间是一个完美的替代方案。
This question is mostly academic, as binary support will probably be added to WebSocket eventually, and base64 is a perfectly cromulent alternative in the meantime.
推荐答案
您可以使用Base-128编码而不是Base-64编码。这只会导致1/3与1/3相反的开销。
You could use a Base-128 encoding instead of a Base-64 encoding. That will only result in an overhead of 1/7 in opposite to 1/3.
这个想法是使用所有可以用单个字节表示的Unicode代码点在UTF-8(0-127)。这意味着所有字节都以 0
开头,因此数据还剩7位:
The idea is to use all Unicode code points that can be represented in a single byte in UTF-8 (0–127). That means all bytes begin with a 0
so there are seven bits left for the data:
0xxxxxxx
这导致编码,其中7个输入字节使用8个输出字节进行编码:
That results in an encoding where 7 input bytes are encoded using 8 output bytes:
input: aaaaaaaa bbbbbbbb cccccccc dddddddd eeeeeeee ffffffff gggggggg
output: 0aaaaaaa 0abbbbbb 0bbccccc 0cccdddd 0ddddeee 0eeeeeff 0ffffffg 0ggggggg
因此输出与输入比率为8/7。
So the output to input ratio is 8/7.
这篇关于以UTF-8字符串存储二进制数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!