JSON 字符串中的二进制数据.比 Base64 更好的东西 [英] Binary Data in JSON String. Something better than Base64

查看:37
本文介绍了JSON 字符串中的二进制数据.比 Base64 更好的东西的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

JSON 格式本身不支持二进制数据.必须对二进制数据进行转义,以便将其放入 JSON 中的字符串元素(即使用反斜杠转义的双引号中的零个或多个 Unicode 字符).

The JSON format natively doesn't support binary data. The binary data has to be escaped so that it can be placed into a string element (i.e. zero or more Unicode chars in double quotes using backslash escapes) in JSON.

转义二进制数据的一个明显方法是使用 Base64.但是,Base64 的处理开销很高.此外,它将 3 个字节扩展为 4 个字符,这导致数据大小增加了约 33%.

An obvious method to escape binary data is to use Base64. However, Base64 has a high processing overhead. Also it expands 3 bytes into 4 characters which leads to an increased data size by around 33%.

一个用例是 CDMI 云存储 API 规范的 v0.8 草案.您使用 JSON 通过 REST-Web 服务创建数据对象,例如

One use case for this is the v0.8 draft of the CDMI cloud storage API specification. You create data objects via a REST-Webservice using JSON, e.g.

PUT /MyContainer/BinaryObject HTTP/1.1
Host: cloud.example.com
Accept: application/vnd.org.snia.cdmi.dataobject+json
Content-Type: application/vnd.org.snia.cdmi.dataobject+json
X-CDMI-Specification-Version: 1.0
{
    "mimetype" : "application/octet-stream",
    "metadata" : [ ],
    "value" :   "TWFuIGlzIGRpc3Rpbmd1aXNoZWQsIG5vdCBvbmx5IGJ5IGhpcyByZWFzb24sIGJ1dCBieSB0aGlz
    IHNpbmd1bGFyIHBhc3Npb24gZnJvbSBvdGhlciBhbmltYWxzLCB3aGljaCBpcyBhIGx1c3Qgb2Yg
    dGhlIG1pbmQsIHRoYXQgYnkgYSBwZXJzZXZlcmFuY2Ugb2YgZGVsaWdodCBpbiB0aGUgY29udGlu
    dWVkIGFuZCBpbmRlZmF0aWdhYmxlIGdlbmVyYXRpb24gb2Yga25vd2xlZGdlLCBleGNlZWRzIHRo
    ZSBzaG9ydCB2ZWhlbWVuY2Ugb2YgYW55IGNhcm5hbCBwbGVhc3VyZS4=",
}

是否有更好的方法和标准方法将二进制数据编码为 JSON 字符串?

Are there better ways and standard methods to encode binary data into JSON strings?

推荐答案

根据 JSON 规范,有 94 个 Unicode 字符可以表示为一个字节(如果您的 JSON 以 UTF-8 格式传输).考虑到这一点,我认为你可以在空间方面做的最好的是 base85 代表四个字节作为五个字符.然而,这仅比 base64 提高了 7%,计算成本更高,而且实现比 base64 少,所以它可能不会成功.

There are 94 Unicode characters which can be represented as one byte according to the JSON spec (if your JSON is transmitted as UTF-8). With that in mind, I think the best you can do space-wise is base85 which represents four bytes as five characters. However, this is only a 7% improvement over base64, it's more expensive to compute, and implementations are less common than for base64 so it's probably not a win.

您也可以简单地将每个输入字节映射到 U+0000-U+00FF 中的相应字符,然后执行 JSON 标准要求的最低编码以传递这些字符;这里的优点是所需的解码在内置函数之外为零,但空间效率很差——105% 的扩展(如果所有输入字节的可能性相同)与 base85 的 25% 或 base64 的 33%.

You could also simply map every input byte to the corresponding character in U+0000-U+00FF, then do the minimum encoding required by the JSON standard to pass those characters; the advantage here is that the required decoding is nil beyond builtin functions, but the space efficiency is bad -- a 105% expansion (if all input bytes are equally likely) vs. 25% for base85 or 33% for base64.

最终判决:在我看来,base64 获胜,理由是它很常见、简单且不差足以值得更换.

Final verdict: base64 wins, in my opinion, on the grounds that it's common, easy, and not bad enough to warrant replacement.

另见:Base91Base122

这篇关于JSON 字符串中的二进制数据.比 Base64 更好的东西的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆