字典与对象 - 哪个更有效,为什么? [英] Dictionary vs Object - which is more efficient and why?

查看:24
本文介绍了字典与对象 - 哪个更有效,为什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在内存使用和 CPU 消耗方面,Python 中哪个更高效 - 字典还是对象?

What is more efficient in Python in terms of memory usage and CPU consumption - Dictionary or Object?

背景:我必须将大量数据加载到 Python 中.我创建了一个对象,它只是一个字段容器.创建 4M 实例并将它们放入字典大约需要 10 分钟和大约 6GB 的内存.字典准备好后,一眨眼就可以访问了.

Background: I have to load huge amount of data into Python. I created an object that is just a field container. Creating 4M instances and putting them into a dictionary took about 10 minutes and ~6GB of memory. After dictionary is ready, accessing it is a blink of an eye.

示例:为了检查性能,我编写了两个执行相同操作的简单程序 - 一个使用对象,另一个使用字典:

Example: To check the performance I wrote two simple programs that do the same - one is using objects, other dictionary:

对象(执行时间~18秒):

Object (execution time ~18sec):

class Obj(object):
  def __init__(self, i):
    self.i = i
    self.l = []
all = {}
for i in range(1000000):
  all[i] = Obj(i)

字典(执行时间~12秒):

Dictionary (execution time ~12sec):

all = {}
for i in range(1000000):
  o = {}
  o['i'] = i
  o['l'] = []
  all[i] = o

问题:我做错了什么还是字典比对象快?如果字典确实表现得更好,有人可以解释一下原因吗?

Question: Am I doing something wrong or dictionary is just faster than object? If indeed dictionary performs better, can somebody explain why?

推荐答案

您是否尝试过使用 __slots__?

来自文档:

默认情况下,旧式和新式类的实例都有一个用于属性存储的字典.这浪费了实例变量很少的对象的空间.创建大量实例时,空间消耗会变得很严重.

By default, instances of both old and new-style classes have a dictionary for attribute storage. This wastes space for objects having very few instance variables. The space consumption can become acute when creating large numbers of instances.

可以通过在新式类定义中定义 __slots__ 来覆盖默认值.__slots__ 声明采用一系列实例变量,并在每个实例中保留足够的空间来保存每个变量的值.节省了空间,因为 __dict__ 不是为每个实例创建的.

The default can be overridden by defining __slots__ in a new-style class definition. The __slots__ declaration takes a sequence of instance variables and reserves just enough space in each instance to hold a value for each variable. Space is saved because __dict__ is not created for each instance.

那么这是否既节省时间又节省内存?

So does this save time as well as memory?

比较我电脑上的三种方法:

Comparing the three approaches on my computer:

test_slots.py:

test_slots.py:

class Obj(object):
  __slots__ = ('i', 'l')
  def __init__(self, i):
    self.i = i
    self.l = []
all = {}
for i in range(1000000):
  all[i] = Obj(i)

test_obj.py:

test_obj.py:

class Obj(object):
  def __init__(self, i):
    self.i = i
    self.l = []
all = {}
for i in range(1000000):
  all[i] = Obj(i)

test_dict.py:

test_dict.py:

all = {}
for i in range(1000000):
  o = {}
  o['i'] = i
  o['l'] = []
  all[i] = o

test_namedtuple.py(2.6 支持):

test_namedtuple.py (supported in 2.6):

import collections

Obj = collections.namedtuple('Obj', 'i l')

all = {}
for i in range(1000000):
  all[i] = Obj(i, [])

运行基准测试(使用 CPython 2.5):

Run benchmark (using CPython 2.5):

$ lshw | grep product | head -n 1
          product: Intel(R) Pentium(R) M processor 1.60GHz
$ python --version
Python 2.5
$ time python test_obj.py && time python test_dict.py && time python test_slots.py 

real    0m27.398s (using 'normal' object)
real    0m16.747s (using __dict__)
real    0m11.777s (using __slots__)

使用 CPython 2.6.2,包括命名元组测试:

Using CPython 2.6.2, including the named tuple test:

$ python --version
Python 2.6.2
$ time python test_obj.py && time python test_dict.py && time python test_slots.py && time python test_namedtuple.py 

real    0m27.197s (using 'normal' object)
real    0m17.657s (using __dict__)
real    0m12.249s (using __slots__)
real    0m12.262s (using namedtuple)

所以是的(这并不奇怪),使用 __slots__ 是一种性能优化.使用命名元组具有与 __slots__ 相似的性能.

So yes (not really a surprise), using __slots__ is a performance optimization. Using a named tuple has similar performance to __slots__.

这篇关于字典与对象 - 哪个更有效,为什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆