json序列化比Python中的yaml序列化快得多吗? [英] How is it that json serialization is so much faster than yaml serialization in Python?

查看:135
本文介绍了json序列化比Python中的yaml序列化快得多吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的代码在很大程度上依赖yaml进行跨语言序列化,并且在加速某些工作时,我注意到yaml与其他序列化方法(例如pickle,json)相比非常慢.

I have code that relies heavily on yaml for cross-language serialization and while working on speeding some stuff up I noticed that yaml was insanely slow compared to other serialization methods (e.g., pickle, json).

所以真正让我惊讶的是,当输出几乎相同时,json的速度要快得多,以至于yaml.

So what really blows my mind is that json is so much faster that yaml when the output is nearly identical.

>>> import yaml, cjson; d={'foo': {'bar': 1}}
>>> yaml.dump(d, Dumper=yaml.SafeDumper)
'foo: {bar: 1}\n'
>>> cjson.encode(d)
'{"foo": {"bar": 1}}'
>>> import yaml, cjson;
>>> timeit("yaml.dump(d, Dumper=yaml.SafeDumper)", setup="import yaml; d={'foo': {'bar': 1}}", number=10000)
44.506911039352417
>>> timeit("yaml.dump(d, Dumper=yaml.CSafeDumper)", setup="import yaml; d={'foo': {'bar': 1}}", number=10000)
16.852826118469238
>>> timeit("cjson.encode(d)", setup="import cjson; d={'foo': {'bar': 1}}", number=10000)
0.073784112930297852

PyYaml的CSafeDumper和cjson都是用C编写的,所以这不是C vs Python速度问题.我什至还添加了一些随机数据,以查看cjson是否正在执行任何缓存,但是它仍然比PyYaml快得多.我意识到yaml是json的超集,但是使用这样简单的输入,yaml序列化程序怎么会慢2个数量级呢?

PyYaml's CSafeDumper and cjson are both written in C so it's not like this is a C vs Python speed issue. I've even added some random data to it to see if cjson is doing any caching, but it's still way faster than PyYaml. I realize that yaml is a superset of json, but how could the yaml serializer be 2 orders of magnitude slower with such simple input?

推荐答案

通常,决定解析速度的不是输出的复杂度,而是接受的输入的复杂度. JSON语法非常简洁. YAML解析器比较复杂,从而导致开销增加.

In general, it's not the complexity of the output that determines the speed of parsing, but the complexity of the accepted input. The JSON grammar is very concise. The YAML parsers are comparatively complex, leading to increased overheads.

JSON的首要设计目标是 简单和通用.因此, JSON很容易生成和解析, 以减少人员为代价 可读性.它也使用最低 公分母信息模型 确保可以轻松地获取任何JSON数据 由每个现代程序处理 环境.

JSON’s foremost design goal is simplicity and universality. Thus, JSON is trivial to generate and parse, at the cost of reduced human readability. It also uses a lowest common denominator information model, ensuring any JSON data can be easily processed by every modern programming environment.

相反,YAML的首要设计 目标是人类可读性和 支持序列化任意 本机数据结构.因此,YAML 允许非常可读的文件, 但是生成起来更复杂, 解析.此外,YAML合资企业 超越最低公分母 数据类型,要求更复杂 穿越时的处理 不同的编程环境.

In contrast, YAML’s foremost design goals are human readability and support for serializing arbitrary native data structures. Thus, YAML allows for extremely readable files, but is more complex to generate and parse. In addition, YAML ventures beyond the lowest common denominator data types, requiring more complex processing when crossing between different programming environments.

我不是YAML解析器的实现者,因此如果没有一些分析数据和大量示例,我就无法具体说明数量级.无论如何,在对基准数字充满信心之前,请务必对大量输入进行测试.

I'm not a YAML parser implementor, so I can't speak specifically to the orders of magnitude without some profiling data and a big corpus of examples. In any case, be sure to test over a large body of inputs before feeling confident in benchmark numbers.

更新,糟糕,请误解问题. :-(尽管输入语法很大,但是序列化仍然可以非常快;但是,在浏览源代码时,它看起来像PyYAML的Python级序列化构造表示图,而simplejson将内置的Python数据类型直接编码为文本块.

Update Whoops, misread the question. :-( Serialization can still be blazingly fast despite the large input grammar; however, browsing the source, it looks like PyYAML's Python-level serialization constructs a representation graph whereas simplejson encodes builtin Python datatypes directly into text chunks.

这篇关于json序列化比Python中的yaml序列化快得多吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆