我可以加快YAML的速度吗? [英] Can I speedup YAML?
问题描述
我做了一些测试用例来比较YAML和JSON速度:
I made a little test case to compare YAML and JSON speed :
import json
import yaml
from datetime import datetime
from random import randint
NB_ROW=1024
print 'Does yaml is using libyaml ? ',yaml.__with_libyaml__ and 'yes' or 'no'
dummy_data = [ { 'dummy_key_A_%s' % i: i, 'dummy_key_B_%s' % i: i } for i in xrange(NB_ROW) ]
with open('perf_json_yaml.yaml','w') as fh:
t1 = datetime.now()
yaml.safe_dump(dummy_data, fh, encoding='utf-8', default_flow_style=False)
t2 = datetime.now()
dty = (t2 - t1).total_seconds()
print 'Dumping %s row into a yaml file : %s' % (NB_ROW,dty)
with open('perf_json_yaml.json','w') as fh:
t1 = datetime.now()
json.dump(dummy_data,fh)
t2 = datetime.now()
dtj = (t2 - t1).total_seconds()
print 'Dumping %s row into a json file : %s' % (NB_ROW,dtj)
print "json is %dx faster for dumping" % (dty/dtj)
with open('perf_json_yaml.yaml') as fh:
t1 = datetime.now()
data = yaml.safe_load(fh)
t2 = datetime.now()
dty = (t2 - t1).total_seconds()
print 'Loading %s row from a yaml file : %s' % (NB_ROW,dty)
with open('perf_json_yaml.json') as fh:
t1 = datetime.now()
data = json.load(fh)
t2 = datetime.now()
dtj = (t2 - t1).total_seconds()
print 'Loading %s row into from json file : %s' % (NB_ROW,dtj)
print "json is %dx faster for loading" % (dty/dtj)
结果是:
Does yaml is using libyaml ? yes
Dumping 1024 row into a yaml file : 0.251139
Dumping 1024 row into a json file : 0.007725
json is 32x faster for dumping
Loading 1024 row from a yaml file : 0.401224
Loading 1024 row into from json file : 0.001793
json is 223x faster for loading
我在ubuntu 12.04上使用带有libyaml C库的PyYAML 3.11. 我知道json比yaml简单得多,但是json和yaml之间的比率为223x,我想知道我的配置是否正确.
I am using PyYAML 3.11 with libyaml C library on ubuntu 12.04. I know that json is much more simple than yaml, but with a 223x ratio between json and yaml I am wondering whether my configuration is correct or not.
您有相同的速比吗?
如何加快yaml.load()
?
Do you have same speed ratio ?
How can I speed up yaml.load()
?
推荐答案
您可能已经注意到,Python用于数据结构的语法非常与JSON的语法相似.
You've probably noticed that Python's syntax for data structures is very similar to JSON's syntax.
发生了什么事,Python的json
库对Python的内置数据类型进行了编码,然后在此处和此处删除,
(以简化一下).
What's happening is Python's json
library encodes Python's builtin datatypes directly into text chunks, replacing '
into "
and deleting ,
here and there (to oversimplify a bit).
另一方面,pyyaml
在序列化之前必须构造整个表示图变成一个字符串.
On the other hand, pyyaml
has to construct a whole representation graph before serialising it into a string.
加载时,同类的东西必须向后发生.
The same kind of stuff has to happen backwards when loading.
加快yaml.load()
速度的唯一方法是编写一个新的Loader
,但是我怀疑这可能会带来巨大的性能飞跃,除非您愿意编写自己的单用途排序表YAML
解析器,采用
The only way to speedup yaml.load()
would be to write a new Loader
, but I doubt it could be a huge leap in performance, except if you're willing to write your own single-purpose sort-of YAML
parser, taking the following comment in consideration:
YAML构建图是因为它是通用序列化 能够表示对同一引用的多个引用的格式 目的.如果您知道没有重复的对象并且只显示基本类型, 您可以使用json序列化程序,它将仍然是有效的YAML.
YAML builds a graph because it is a general-purpose serialisation format that is able to represent multiple references to the same object. If you know no object is repeated and only basic types appear, you can use a json serialiser, it will still be valid YAML.
-更新
我之前说的仍然是正确的,但是如果您运行的是Linux
,则有一种方法可以加快Yaml
的解析速度.默认情况下,Python的yaml
使用Python解析器.您必须告诉它要使用PyYaml
C
解析器.
What I said before remains true, but if you're running Linux
there's a way to speed up Yaml
parsing. By default, Python's yaml
uses the Python parser. You have to tell it that you want to use PyYaml
C
parser.
您可以这样操作:
import yaml
from yaml import CLoader as Loader, CDumper as Dumper
dump = yaml.dump(dummy_data, fh, encoding='utf-8', default_flow_style=False, Dumper=Dumper)
data = yaml.load(fh, Loader=Loader)
为此,您需要安装yaml-cpp-dev
(程序包后来重命名为libyaml-cpp-dev
),例如,使用apt-get:
In order to do so, you need yaml-cpp-dev
(package later renamed to libyaml-cpp-dev
) installed, for instance with apt-get:
$ apt-get install yaml-cpp-dev
以及PyYaml
和LibYaml
.但是根据您的输出已经是这种情况了.
And PyYaml
with LibYaml
as well. But that's already the case based on your output.
我目前无法测试,因为我正在运行OS X,并且brew
在安装yaml-cpp-dev
时遇到了一些问题,但是如果您遵循 PyYaml文档,他们很清楚性能会好得多.
I can't test it right now because I'm running OS X and brew
has some trouble installing yaml-cpp-dev
but if you follow PyYaml documentation, they are pretty clear that performance will be much better.
这篇关于我可以加快YAML的速度吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!