我可以加快YAML的速度吗? [英] Can I speedup YAML?

查看:179
本文介绍了我可以加快YAML的速度吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我做了一些测试用例来比较YAML和JSON速度:

I made a little test case to compare YAML and JSON speed :

import json
import yaml
from datetime import datetime
from random import randint

NB_ROW=1024

print 'Does yaml is using libyaml ? ',yaml.__with_libyaml__ and 'yes' or 'no'

dummy_data = [ { 'dummy_key_A_%s' % i: i, 'dummy_key_B_%s' % i: i } for i in xrange(NB_ROW) ]


with open('perf_json_yaml.yaml','w') as fh:
    t1 = datetime.now()
    yaml.safe_dump(dummy_data, fh, encoding='utf-8', default_flow_style=False)
    t2 = datetime.now()
    dty = (t2 - t1).total_seconds()
    print 'Dumping %s row into a yaml file : %s' % (NB_ROW,dty)

with open('perf_json_yaml.json','w') as fh:
    t1 = datetime.now()
    json.dump(dummy_data,fh)
    t2 = datetime.now()
    dtj = (t2 - t1).total_seconds()
    print 'Dumping %s row into a json file : %s' % (NB_ROW,dtj)

print "json is %dx faster for dumping" % (dty/dtj)

with open('perf_json_yaml.yaml') as fh:
    t1 = datetime.now()
    data = yaml.safe_load(fh)
    t2 = datetime.now()
    dty = (t2 - t1).total_seconds()
    print 'Loading %s row from a yaml file : %s' % (NB_ROW,dty)

with open('perf_json_yaml.json') as fh:
    t1 = datetime.now()
    data = json.load(fh)
    t2 = datetime.now()
    dtj = (t2 - t1).total_seconds()
    print 'Loading %s row into from json file : %s' % (NB_ROW,dtj)

print "json is %dx faster for loading" % (dty/dtj)

结果是:

Does yaml is using libyaml ?  yes
Dumping 1024 row into a yaml file : 0.251139
Dumping 1024 row into a json file : 0.007725
json is 32x faster for dumping
Loading 1024 row from a yaml file : 0.401224
Loading 1024 row into from json file : 0.001793
json is 223x faster for loading

我在ubuntu 12.04上使用带有libyaml C库的PyYAML 3.11. 我知道json比yaml简单得多,但是json和yaml之间的比率为223x,我想知道我的配置是否正确.

I am using PyYAML 3.11 with libyaml C library on ubuntu 12.04. I know that json is much more simple than yaml, but with a 223x ratio between json and yaml I am wondering whether my configuration is correct or not.

您有相同的速比吗?
如何加快yaml.load()?

Do you have same speed ratio ?
How can I speed up yaml.load() ?

推荐答案

您可能已经注意到,Python用于数据结构的语法非常与JSON的语法相似.

You've probably noticed that Python's syntax for data structures is very similar to JSON's syntax.

发生了什么事,Python的json库对Python的内置数据类型进行了编码,然后在此处和此处删除,(以简化一下).

What's happening is Python's json library encodes Python's builtin datatypes directly into text chunks, replacing ' into " and deleting , here and there (to oversimplify a bit).

另一方面,pyyaml在序列化之前必须构造整个表示图变成一个字符串.

On the other hand, pyyaml has to construct a whole representation graph before serialising it into a string.

加载时,同类的东西必须向后发生.

The same kind of stuff has to happen backwards when loading.

加快yaml.load()速度的唯一方法是编写一个新的Loader,但是我怀疑这可能会带来巨大的性能飞跃,除非您愿意编写自己的单用途排序表YAML解析器,采用

The only way to speedup yaml.load() would be to write a new Loader, but I doubt it could be a huge leap in performance, except if you're willing to write your own single-purpose sort-of YAML parser, taking the following comment in consideration:

YAML构建图是因为它是通用序列化 能够表示对同一引用的多个引用的格式 目的.如果您知道没有重复的对象并且只显示基本类型, 您可以使用json序列化程序,它将仍然是有效的YAML.

YAML builds a graph because it is a general-purpose serialisation format that is able to represent multiple references to the same object. If you know no object is repeated and only basic types appear, you can use a json serialiser, it will still be valid YAML.

-更新

我之前说的仍然是正确的,但是如果您运行的是Linux,则有一种方法可以加快Yaml的解析速度.默认情况下,Python的yaml使用Python解析器.您必须告诉它要使用PyYaml C解析器.

What I said before remains true, but if you're running Linux there's a way to speed up Yaml parsing. By default, Python's yaml uses the Python parser. You have to tell it that you want to use PyYaml C parser.

您可以这样操作:

import yaml
from yaml import CLoader as Loader, CDumper as Dumper

dump = yaml.dump(dummy_data, fh, encoding='utf-8', default_flow_style=False, Dumper=Dumper)
data = yaml.load(fh, Loader=Loader)

为此,您需要安装yaml-cpp-dev(程序包后来重命名为libyaml-cpp-dev),例如,使用apt-get:

In order to do so, you need yaml-cpp-dev (package later renamed to libyaml-cpp-dev) installed, for instance with apt-get:

$ apt-get install yaml-cpp-dev

以及PyYamlLibYaml.但是根据您的输出已经是这种情况了.

And PyYaml with LibYaml as well. But that's already the case based on your output.

我目前无法测试,因为我正在运行OS X,并且brew在安装yaml-cpp-dev时遇到了一些问题,但是如果您遵循 PyYaml文档,他们很清楚性能会好得多.

I can't test it right now because I'm running OS X and brew has some trouble installing yaml-cpp-dev but if you follow PyYaml documentation, they are pretty clear that performance will be much better.

这篇关于我可以加快YAML的速度吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆