Python字典要处理多少钱? [英] How expensive are Python dictionaries to handle?

查看:168
本文介绍了Python字典要处理多少钱?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

正如标题所述,Python字典处理有多昂贵?创建,插入,更新,删除,所有这些。



渐近时间复杂性本身很有趣,但也是如何与例如元组或正常列表。

解决方案

dicts (就像 set s,当您不需要将值与每个关键字相关联,但只是记录一个键是否存在)是非常优化的。从N个键或键/值对创建一个 dict O(N),提取是 O(1),put被摊销 O(1)等等。对于任何非小容器,不能真正做任何事情更好!



对于小容器,您可以使用 timeit 基准。例如:

  $ python -mtimeit -s'empty =()''23为空'
10000000个循环,最好3:0.0709 usec每循环
$ python -mtimeit -s'empty = set()''23为空'
10000000循环,最好为3:0.101 usec每循环
$ python -mtimeit -s'empty = []''在空'
10000000循环,最好3:0.0716 usec每循环
$ python -mtimeit -s'empty = dict()'' 23在空'
10000000循环,最好的3:0.0926 usec每循环

这显示检查空列表或元组的成员资格比检查空集或成员的成员更快,高达20-30纳秒;当每一纳秒重要时,这个信息可能与你有关系。向上移动...:

  $ python -mtimeit -s'empty = range(7)'' '
1000000循环,最好3:0.318 usec每循环
$ python -mtimeit -s'empty = tuple(range(7))''23在空'
1000000循环,最好3:0.311 usec每循环
$ python -mtimeit -s'empty = set(range(7))''23 in empty'
10000000循环,最好3:0.109 usec每循环
$ python -mtimeit -s'empty = dict.fromkeys(range(7))''23在空'
10000000循环,最好3:0.0933 usec每循环

你看到对于7个项目的容器(不包括感兴趣的),性能的平衡已经转移,现在,以纳秒为单位的优点。当有兴趣的项目存在时:

  $ python -mtimeit -s'empty = range(7)''5 in empty '
1000000循环,最好3:0.246 usec每循环
$ python -mtimeit -s'empty = tuple(range(7))''5在空'
1000000循环,最好每个循环3:0.25 usec
$ python -mtimeit -s'empty = dict.fromkeys(range(7))''5在空'
10000000循环,最好3:0.0921 usec每循环
$ python -mtimeit -s'empty = set(range(7))''5 in empty'
10000000循环,最好3:0.112 usec每循环

分组和集合并没有多大收获,但是元组和列表也是这样做的,尽管分组和集合仍然快得多。



等等,等等 - timeit 使得运行微型基准测试非常简单(严格来说,仅对那些非常的罕见的情况下,纳秒是重要的,但是很容易做到这一点,检查其他情况并不困难; - )。


As the title states, how expensive are Python dictionaries to handle? Creation, insertion, updating, deletion, all of it.

Asymptotic time complexities are interesting themselves, but also how they compare to e.g. tuples or normal lists.

解决方案

dicts (just like sets when you don't need to associate a value to each key but simply record if a key is present or absent) are pretty heavily optimized. Creating a dict from N keys or key/value pairs is O(N), fetching is O(1), putting is amortized O(1), and so forth. Can't really do anything substantially better for any non-tiny container!

For tiny containers, you can easily check the boundaries with timeit-based benchmarks. For example:

$ python -mtimeit -s'empty=()' '23 in empty'
10000000 loops, best of 3: 0.0709 usec per loop
$ python -mtimeit -s'empty=set()' '23 in empty'
10000000 loops, best of 3: 0.101 usec per loop
$ python -mtimeit -s'empty=[]' '23 in empty'
10000000 loops, best of 3: 0.0716 usec per loop
$ python -mtimeit -s'empty=dict()' '23 in empty'
10000000 loops, best of 3: 0.0926 usec per loop

this shows that checking membership in empty lists or tuples is faster, by a whopping 20-30 nanoseconds, than checking membership in empty sets or dicts; when every nanosecond matters, this info might be relevant to you. Moving up a bit...:

$ python -mtimeit -s'empty=range(7)' '23 in empty'
1000000 loops, best of 3: 0.318 usec per loop
$ python -mtimeit -s'empty=tuple(range(7))' '23 in empty'
1000000 loops, best of 3: 0.311 usec per loop
$ python -mtimeit -s'empty=set(range(7))' '23 in empty'
10000000 loops, best of 3: 0.109 usec per loop
$ python -mtimeit -s'empty=dict.fromkeys(range(7))' '23 in empty'
10000000 loops, best of 3: 0.0933 usec per loop

you see that for 7-items containers (not including the one of interest) the balance of performance has shifted, and now dicts and sets have the advantages by HUNDREDS of nanoseconds. When the item of interest IS present:

$ python -mtimeit -s'empty=range(7)' '5 in empty'
1000000 loops, best of 3: 0.246 usec per loop
$ python -mtimeit -s'empty=tuple(range(7))' '5 in empty'
1000000 loops, best of 3: 0.25 usec per loop
$ python -mtimeit -s'empty=dict.fromkeys(range(7))' '5 in empty'
10000000 loops, best of 3: 0.0921 usec per loop
$ python -mtimeit -s'empty=set(range(7))' '5 in empty'
10000000 loops, best of 3: 0.112 usec per loop

dicts and sets don't gain much, but tuples and list do, even though dicts and set remain vastly faster.

And so on, and so forth -- timeit makes it trivially easy to run micro-benchmarks (strictly speaking, warranted only for those exceedingly rare situations where nanoseconds DO matter, but, easy enough to do, that it's no big hardship to check for OTHER cases;-).

这篇关于Python字典要处理多少钱?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆