Gzip JSON的性能与有效的二进制序列化 [英] Performance of gzipped json vs efficient binary serialization

查看:101
本文介绍了Gzip JSON的性能与有效的二进制序列化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

JSON和Gzip是一种序列化数据的简单方法.这些已在各种编程语言中广泛实现.同样,这种表示可以跨系统移植(是吗?).

JSON and Gzip is a simple way to serialize data. These are widely implemented across programming languages. Also this representation is portable across systems (is it?).

我的问题是,与非常有效的二进制序列化方法相比,json + gzip是否足够好(不到2倍的成本)?在序列化各种数据时,我正在寻找空间和时间成本.

My question is whether json+gzip is good enough (less then 2x cost) compared to very efficient binary serialization methods? I'm looking for space and time costs while serializing various kinds of data.

推荐答案

使用json + gzip进行序列化比数字和对象的rawbytes + gzip占用的空间多25%.对于有限的精度数字(4个有效数字),序列化的大小是相同的.对于小型应用程序而言,使用json + gzip就数据大小而言似乎已经足够了.即使发送记录数组,其中每个记录都完全说明了字段和值(在JavaScript中存储数据的常用方式),也是这样.

Serialising with json+gzip uses 25% more space than rawbytes+gzip for numbers and objects. For limited precision numbers (4 significant digits) the serialised size is the same. It seems that for small scale applications, using json+gzip is good enough in terms of data size. This is true even when sending an array of records where each record fully spells out the fields and values (the common way of storing data in JavaScript).

以下实验的来源: https://github.com/csiz/gzip-json性能

我选择了一百万个浮点数(64位).我假设这些数字来自某些自然来源,所以我使用指数分布来生成它们,然后将它们四舍五入为4个有效数字.因为JSON记下了整个表示形式,所以我认为存储大量数字可能会产生较大的费用(例如,存储123456.000000,而存储0.123456),所以我检查了两种情况.我还检查了尚未四舍五入的序列号.

I picked a million floating point (64 bit) numbers. I assume these numbers come from some natural source so I used an exponential distribution to generate them and then round them to 4 significant digits. Because JSON writes down the whole representation I thought storing large numbers might incur a bigger cost (eg. storing 123456.000000, vs 0.123456) so I check both cases. I also check serialising numbers that haven't been rounded.

序列化较小的数字时,压缩的json使用的大小比压缩的二进制文件大9%(数量级在1.0左右,因此只能写下几位数):

Size used by compressed json is 9% larger vs compressed binary when serialising small numbers (order of magnitude around 1.0, so only a few digits to write down):

json 3.29mb json/raw 43%
binary 3.03mb binary/raw 40%
json/binary 1.09

序列化大量数字时,压缩的json使用的大小比压缩的二进制文件小17%(数量级在1000000左右,需要写下更多数字):

Size used by compressed json is 17% smaller vs compressed binary when serialising large numbers (order of magnitude around 1000000, more digits to write down):

json 2.58mb json/raw 34%
binary 3.10mb binary/raw 41%
json/binary 0.83

序列化全精度双精度时,压缩json使用的大小比压缩二进制文件大22%:

Size used by compressed json is 22% larger vs compressed binary when serialising full precision doubles:

json 8.90mb json/raw 117%
binary 7.27mb binary/raw 95%
json/binary 1.22

对象

对于对象,我正在使用JSON中通常的惰性方式来序列化它们.每个对象都存储为包含字段名称和值的完整记录. 选择"枚举的值已完全阐明.

Objects

For objects I'm serialising them the usual lazy way in JSON. Each object is stored as a complete record with the field names and values. The "choice" enumeration has it's value fully spelled out.

[
  {
    "small number": 0.1234,
    "large number": 1234000,
    "choice": "two"
  },
  ...
]

在进行有效的二进制表示时,我对对象进行了矢量化处理.我存储对象的数量,然后存储小数字的连续向量,然后存储选择枚举的连续向量.在这种情况下,我假设枚举值是已知的并且是固定的,所以我将索引存储到该枚举中.

While for the efficient binary representation I vectorise the objects. I store the number of objects, then a continuous vector of the small numbers, then a continuous vector for the choice enum. In this case I assume the enum values are known and fixed, so I store the index into this enum.

n = 1e6
small number = binary([0.1234, ...])
large number = binary([1234000, ...])
choice = binary([2, ...]) # indexes to the enum ["zero", "one", ..., "four"]

在存储对象时,压缩json使用的大小比压缩二进制文件大27%:

Size used by compressed json is 27% larger vs compressed binary when storing objects:

json 8.36mb json/raw 44%
binary 6.59mb binary/raw 35%
json/binary 1.27

这篇关于Gzip JSON的性能与有效的二进制序列化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆