为什么 json.loads 比 ast.literal_eval 快一个数量级? [英] Why is json.loads an order of magnitude faster than ast.literal_eval?

查看:26
本文介绍了为什么 json.loads 比 ast.literal_eval 快一个数量级?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在回答关于如何解析包含浮点数组的文本文件的问题后,我运行了以下基准测试:

After answering a question about how to parse a text file containing arrays of floats, I ran the following benchmark:

import timeit
import random

line = [random.random() for x in range(1000)]
n = 10000

json_setup = 'line = "{}"; import json'.format(line)
json_work = 'json.loads(line)'
json_time = timeit.timeit(json_work, json_setup, number=n)
print "json: ", json_time

ast_setup = 'line = "{}"; import ast'.format(line)
ast_work = 'ast.literal_eval(line)'
ast_time = timeit.timeit(ast_work, ast_setup, number=n)
print "ast: ", ast_time

print "time ratio ast/json: ", ast_time / json_time

我多次运行此代码并始终得到这种结果:

I ran this code several times and consistently got this kind of results:

$ python json-ast-bench.py 
json: 4.3199338913
ast: 28.4827561378
time ratio ast/json:  6.59333148483

所以看起来 json 几乎是对于这种用途,比 ast 快一个数量级案例.

So it appears that json is almost an order of magnitude faster than ast for this use case.

我在 Python 2.7.5+ 和 Python 3.3.2+ 上都得到了相同的结果.

I had the same results with both Python 2.7.5+ and Python 3.3.2+.

问题:

  1. 为什么 json.loads 这么快?这个问题似乎暗示 ast 在输入数据(双引号或单引号)方面更加灵活
  2. 有没有我更喜欢使用 ast.literal_eval 而不是 json.loads 的用例,尽管它速度较慢?
  1. Why is json.loads so much faster ? This question seems to imply that ast is more flexible regarding the input data (double or single quotes)
  2. Are there use cases where I would prefer to use ast.literal_eval over json.loads although it's slower ?

无论如何,如果性能很重要,我建议使用 UltraJSON(正是我在工作中使用的,比使用相同迷你基准的 json 快 4 倍).

Anyway if performance matters, I would recommend using UltraJSON (just what I use at work, ~4 times faster than json using the same mini-benchmark).

推荐答案

这两个函数正在解析完全不同的语言——JSON 和 Python 文字语法.* As literal_eval 说:

The two functions are parsing entirely different languages—JSON, and Python literal syntax.* As literal_eval says:

提供的字符串或节点只能包含以下 Python 文字结构:字符串、字节、数字、元组、列表、字典、集合、布尔值和 None.

The string or node provided may only consist of the following Python literal structures: strings, bytes, numbers, tuples, lists, dicts, sets, booleans, and None.

JSON 相比之下,只处理双引号的 JavaScript 字符串文字(与 Python 的不完全相同**)、JavaScript 数字(仅 int 和 float***)、对象(大致相当于 dicts)、数组(大致相当于列表)、JavaScript 布尔值(与 Python 不同)和 null.

JSON, by contrast, only handles double-quoted JavaScript string literals (not quite identical to Python's**), JavaScript numbers (only int and float***), objects (roughly equivalent to dicts), arrays (roughly equivalent to lists), JavaScript booleans (which are different from Python's), and null.

这两种语言碰巧有一些重叠的事实并不意味着它们是同一种语言.

The fact that these two languages happen to have some overlap doesn't mean they're the same language.

为什么 json.loads 这么快?

因为 Python 文字语法是一种比 JSON 更复杂、更强大的语言,所以它的解析速度可能会更慢.而且,可能更重要的是,因为 Python 文字语法不打算用作数据交换格式(事实上,它专门 应该用于),没有人可能会付出太多努力加快数据交换速度.****

Because Python literal syntax is a more complex and powerful language than JSON, it's likely to be slower to parse. And, probably more importantly, because Python literal syntax is not intended to be used as a data interchange format (in fact, it's specifically not supposed to be used for that), nobody is likely to put much effort into making it fast for data interchange.****

这个问题似乎暗示 ast 在输入数据(双引号或单引号)方面更加灵活

This question seems to imply that ast is more flexible regarding the input data (double or single quotes)

那个、原始字符串文字、Unicode 与字节字符串文字、复数和集合,以及 JSON 无法处理的所有其他类型.

That, and raw string literals, and Unicode vs. bytes string literals, and complex numbers, and sets, and all kinds of other things JSON doesn't handle.

有没有我更喜欢使用 ast.literal_eval 而不是 json.loads 的用例,尽管它速度较慢?

Are there use cases where I would prefer to use ast.literal_eval over json.loads although it's slower ?

是的.当你想解析 Python 文字时,你应该使用 ast.literal_eval.(或者,更好的是,重新考虑你的设计,这样你就不想解析 Python 文字......)

Yes. When you want to parse Python literals, you should use ast.literal_eval. (Or, better yet, re-think your design so you don't want to parse Python literals…)

* 这是一个有点模糊的术语.例如,-2 不是 文字 在 Python 中,但是一个运算符表达式,但是 literal_eval 可以处理.当然 tuple/list/dict/set 显示不是文字,但是 literal_eval 可以处理它们——除了推导式也是显示,而 literal_eval 不能处理它们.ast 模块中的其他函数可以帮助您找出什么是字面量,什么不是字面量——例如,ast.dump(ast.parse("expr")).

* This is a bit of a vague term. For example, -2 is not a literal in Python, but an operator expression, but literal_eval can handle it. And of course tuple/list/dict/set displays are not literals, but literal_eval can handle them—except that comprehensions are also displays, and literal_eval cannot handle them. Other functions in the ast module can help you find out what really is and isn't a literal—e.g., ast.dump(ast.parse("expr")).

** 例如,"q" 是 JSON 中的错误.

** For example, "q" is an error in JSON.

*** 从技术上讲,JSON 只处理一种数字"类型,即浮点数.但是 Python 的 json 模块将没有小数点或指数的数字解析为整数,在许多其他语言的 JSON 模块中也是如此.

*** Technically, JSON only handles one "number" type, which is floating-point. But Python's json module parses numbers with no decimal point or exponent as integers, and the same is true in many other languages' JSON modules.

**** 如果你错过了 Tim Peters 对这个问题的评论:ast.literal_eval 被如此轻率地使用,以至于没有人认为值得花时间工作(& 工作,& 工作) 以加快速度.相比之下,JSON 库通常用于解析千兆字节的数据."

**** If you missed Tim Peters's comment on the question: "ast.literal_eval is so lightly used that nobody felt it was worth the time to work (& work, & work) at speeding it. In contrast, the JSON libraries are routinely used to parse gigabytes of data."

这篇关于为什么 json.loads 比 ast.literal_eval 快一个数量级?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆