字典与对象 - 哪个更有效,为什么? [英] Dictionary vs Object - which is more efficient and why?
问题描述
在内存使用和 CPU 消耗方面,Python 中哪个更高效 - 字典还是对象?
What is more efficient in Python in terms of memory usage and CPU consumption - Dictionary or Object?
背景:我必须将大量数据加载到 Python 中.我创建了一个对象,它只是一个字段容器.创建 4M 实例并将它们放入字典大约需要 10 分钟和大约 6GB 的内存.字典准备好后,一眨眼就可以访问了.
Background: I have to load huge amount of data into Python. I created an object that is just a field container. Creating 4M instances and putting them into a dictionary took about 10 minutes and ~6GB of memory. After dictionary is ready, accessing it is a blink of an eye.
示例:为了检查性能,我编写了两个执行相同操作的简单程序 - 一个使用对象,另一个使用字典:
Example: To check the performance I wrote two simple programs that do the same - one is using objects, other dictionary:
对象(执行时间~18秒):
Object (execution time ~18sec):
class Obj(object):
def __init__(self, i):
self.i = i
self.l = []
all = {}
for i in range(1000000):
all[i] = Obj(i)
字典(执行时间~12秒):
Dictionary (execution time ~12sec):
all = {}
for i in range(1000000):
o = {}
o['i'] = i
o['l'] = []
all[i] = o
问题:我做错了什么还是字典比对象快?如果字典确实表现得更好,有人可以解释一下原因吗?
Question: Am I doing something wrong or dictionary is just faster than object? If indeed dictionary performs better, can somebody explain why?
推荐答案
您是否尝试过使用 __slots__
?
来自文档:
默认情况下,旧式和新式类的实例都有一个用于属性存储的字典.这浪费了实例变量很少的对象的空间.创建大量实例时,空间消耗会变得很严重.
By default, instances of both old and new-style classes have a dictionary for attribute storage. This wastes space for objects having very few instance variables. The space consumption can become acute when creating large numbers of instances.
可以通过在新式类定义中定义 __slots__
来覆盖默认值.__slots__
声明采用一系列实例变量,并在每个实例中保留足够的空间来保存每个变量的值.节省了空间,因为 __dict__
不是为每个实例创建的.
The default can be overridden by defining __slots__
in a new-style class definition. The __slots__
declaration takes a sequence of instance variables and reserves just enough space in each instance to hold a value for each variable. Space is saved because __dict__
is not created for each instance.
那么这是否既节省时间又节省内存?
So does this save time as well as memory?
比较我电脑上的三种方法:
Comparing the three approaches on my computer:
test_slots.py:
test_slots.py:
class Obj(object):
__slots__ = ('i', 'l')
def __init__(self, i):
self.i = i
self.l = []
all = {}
for i in range(1000000):
all[i] = Obj(i)
test_obj.py:
test_obj.py:
class Obj(object):
def __init__(self, i):
self.i = i
self.l = []
all = {}
for i in range(1000000):
all[i] = Obj(i)
test_dict.py:
test_dict.py:
all = {}
for i in range(1000000):
o = {}
o['i'] = i
o['l'] = []
all[i] = o
test_namedtuple.py(2.6 支持):
test_namedtuple.py (supported in 2.6):
import collections
Obj = collections.namedtuple('Obj', 'i l')
all = {}
for i in range(1000000):
all[i] = Obj(i, [])
运行基准测试(使用 CPython 2.5):
Run benchmark (using CPython 2.5):
$ lshw | grep product | head -n 1
product: Intel(R) Pentium(R) M processor 1.60GHz
$ python --version
Python 2.5
$ time python test_obj.py && time python test_dict.py && time python test_slots.py
real 0m27.398s (using 'normal' object)
real 0m16.747s (using __dict__)
real 0m11.777s (using __slots__)
使用 CPython 2.6.2,包括命名元组测试:
Using CPython 2.6.2, including the named tuple test:
$ python --version
Python 2.6.2
$ time python test_obj.py && time python test_dict.py && time python test_slots.py && time python test_namedtuple.py
real 0m27.197s (using 'normal' object)
real 0m17.657s (using __dict__)
real 0m12.249s (using __slots__)
real 0m12.262s (using namedtuple)
所以是的(这并不奇怪),使用 __slots__
是一种性能优化.使用命名元组具有与 __slots__
相似的性能.
So yes (not really a surprise), using __slots__
is a performance optimization. Using a named tuple has similar performance to __slots__
.
这篇关于字典与对象 - 哪个更有效,为什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!