为什么json.loads比ast.literal_eval快一个数量级? [英] Why is json.loads an order of magnitude faster than ast.literal_eval?

查看:160
本文介绍了为什么json.loads比ast.literal_eval快一个数量级?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

回答了有关如何解析包含浮点数数组的文本文件的问题后,我运行了以下基准测试: /p>

After answering a question about how to parse a text file containing arrays of floats, I ran the following benchmark:

import timeit
import random

line = [random.random() for x in range(1000)]
n = 10000

json_setup = 'line = "{}"; import json'.format(line)
json_work = 'json.loads(line)'
json_time = timeit.timeit(json_work, json_setup, number=n)
print "json: ", json_time

ast_setup = 'line = "{}"; import ast'.format(line)
ast_work = 'ast.literal_eval(line)'
ast_time = timeit.timeit(ast_work, ast_setup, number=n)
print "ast: ", ast_time

print "time ratio ast/json: ", ast_time / json_time

我多次运行此代码,并始终得到以下结果:

I ran this code several times and consistently got this kind of results:

$ python json-ast-bench.py 
json: 4.3199338913
ast: 28.4827561378
time ratio ast/json:  6.59333148483

因此,看来 json 几乎是一个数量级在这种情况下,它比 ast 快.

So it appears that json is almost an order of magnitude faster than ast for this use case.

我在Python 2.7.5+和Python 3.3.2+上都得到了相同的结果.

I had the same results with both Python 2.7.5+ and Python 3.3.2+.

问题:

  1. 为什么json.loads这么快? 此问题似乎暗示ast在输入数据(双引号或单引号)方面更为灵活
  2. 在某些情况下,我会更喜欢使用ast.literal_eval而不是json.loads吗?
  1. Why is json.loads so much faster ? This question seems to imply that ast is more flexible regarding the input data (double or single quotes)
  2. Are there use cases where I would prefer to use ast.literal_eval over json.loads although it's slower ?

无论如何,如果性能很重要,我建议使用 UltraJSON (正好是我在工作中使用的工具,使用相同的迷你基准测试,它比json快4倍).

Anyway if performance matters, I would recommend using UltraJSON (just what I use at work, ~4 times faster than json using the same mini-benchmark).

推荐答案

这两个函数正在解析完全不同的语言-JSON和Python文字语法.*作为

The two functions are parsing entirely different languages—JSON, and Python literal syntax.* As literal_eval says:

提供的字符串或节点只能由以下Python文字结构组成:字符串,字节,数字,元组,列表,字典,集合,布尔值和None.

相比之下,

JSON 仅处理双引号的JavaScript字符串文字(与Python **不太相同) ,JavaScript数字(仅int和float ***),对象(大致相当于dict),数组(大致相当于列表),JavaScript布尔值(与Python不同)和null.

JSON, by contrast, only handles double-quoted JavaScript string literals (not quite identical to Python's**), JavaScript numbers (only int and float***), objects (roughly equivalent to dicts), arrays (roughly equivalent to lists), JavaScript booleans (which are different from Python's), and null.

这两种语言碰巧有重叠的事实并不意味着它们是同一语言.

The fact that these two languages happen to have some overlap doesn't mean they're the same language.

为什么json.loads这么快?

由于Python文字语法是一种比JSON更复杂,功能更强大的语言,因此解析起来可能会更慢.而且,可能更重要的是,因为Python文字语法不打算用作数据交换格式(实际上,它专门用于 not ),所以没人愿意付出很多努力使其快速进行数据交换.****

Because Python literal syntax is a more complex and powerful language than JSON, it's likely to be slower to parse. And, probably more importantly, because Python literal syntax is not intended to be used as a data interchange format (in fact, it's specifically not supposed to be used for that), nobody is likely to put much effort into making it fast for data interchange.****

这个问题似乎暗示ast在输入数据(双引号或单引号)方面更为灵活

This question seems to imply that ast is more flexible regarding the input data (double or single quotes)

那,原始字符串文字,Unicode与字节字符串文字,复数,集合以及JSON无法处理的所有其他内容.

That, and raw string literals, and Unicode vs. bytes string literals, and complex numbers, and sets, and all kinds of other things JSON doesn't handle.

在某些情况下,我会更喜欢使用ast.literal_eval而不是json.loads吗?

Are there use cases where I would prefer to use ast.literal_eval over json.loads although it's slower ?

是的.当您想解析Python文字时,应使用ast.literal_eval. (或者,更好的是,重新考虑您的设计,以便您不想解析Python文字……)

Yes. When you want to parse Python literals, you should use ast.literal_eval. (Or, better yet, re-think your design so you don't want to parse Python literals…)

*这是一个模糊的术语.例如,-2在Python中不是文字,而是运算符表达式,但是 literal_eval 可以处理它.当然,元组/列表/字典/集合显示不是文字,但是literal_eval可以处理它们-除了理解也是显示,而literal_eval无法处理它们. ast模块中的其他功能可以帮助您找出真正的字面值和非字面值,例如ast.dump(ast.parse("expr")).

* This is a bit of a vague term. For example, -2 is not a literal in Python, but an operator expression, but literal_eval can handle it. And of course tuple/list/dict/set displays are not literals, but literal_eval can handle them—except that comprehensions are also displays, and literal_eval cannot handle them. Other functions in the ast module can help you find out what really is and isn't a literal—e.g., ast.dump(ast.parse("expr")).

**例如,"\q"是JSON中的错误.

** For example, "\q" is an error in JSON.

***从技术上讲,JSON仅处理一种数字"类型,即浮点数.但是Python的json模块将不带小数点或指数的数字解析为整数,在许多其他语言的JSON模块中也是如此.

*** Technically, JSON only handles one "number" type, which is floating-point. But Python's json module parses numbers with no decimal point or exponent as integers, and the same is true in many other languages' JSON modules.

****如果您错过了蒂姆·彼得斯(Tim Peters)对问题的评论:"ast.literal_eval的使用率太低,没有人认为值得花时间去工作(&工作,&工作). ,JSON库通常用于解析千兆字节的数据."

**** If you missed Tim Peters's comment on the question: "ast.literal_eval is so lightly used that nobody felt it was worth the time to work (& work, & work) at speeding it. In contrast, the JSON libraries are routinely used to parse gigabytes of data."

这篇关于为什么json.loads比ast.literal_eval快一个数量级?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆