字典vs对象 - 哪个更有效率,为什么? [英] Dictionary vs Object - which is more efficient and why?

查看:152
本文介绍了字典vs对象 - 哪个更有效率,为什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



背景:
我必须将大量的数据加载到Python中。我创建了一个只是一个字段容器的对象。创建4M实例并将其放入字典中大约需要10分钟和大约6GB的内存。在字典准备好后,访问它是一眨眼。



示例:
要检查性能,我写了两个简单执行相同的程序 - 一个使用对象,其他字典:



对象(执行时间〜18秒):

  class Obj(object):
def __init __(self,i):
self.i = i
self.l = []
all = {}
在我的范围(1000000)中:
全部[i] = Obj(i)

字典(执行时间〜12秒):

  all = {} 
对于我在范围(1000000):
o = {}
o ['i'] = i
o ['l'] = []
全部[i] = o

问题:
我做错了什么或字典只是更快比对象?如果确实字典表现更好,可以有人解释为什么?

解决方案

你尝试使用 __ slots __



从文档 http://docs.python.org/reference/datamodel.html#slots



默认情况下,样式类有一个用于属性存储的字典,这浪费了实例变量非常少的对象的空间,创建大量实例时,空间消耗可能变得很大。



默认值可以通过在新式类定义中定义 __ slots __ 来覆盖。 __ slots __ 声明采用一系列实例变量,在每个实例中保留足够的空间以保存每个变量的值。空间被保存,因为 __ dict __ 不是为每个实例创建的。



这样保存ti我和电脑记忆?



比较我电脑上的三种方法:



test_slots.py: p>

  class Obj(object):
__slots__ =('i','l')
def __init __ self,i):
self.i = i
self.l = []
all = {}
为我的范围(1000000):
全部[ i] = Obj(i)

test_obj.py:

  class Obj(object):
def __init __(self,i):
self.i = i
self.l = [ ]
all = {}
对于我在范围(1000000)中:
全部[i] = Obj(i)

test_dict.py:

  all = {} 
我在范围(1000000):
o = {}
o ['i'] = i
o ['l'] = []
全部[i] = o

test_namedtuple.py(2.6中支持):

  import collections 

Obj = collections.namedtuple('Obj','i l')

全= {}
因为我在范围(1000000):
所有[i] = Obj(i,[])

运行基准测试(使用CPython 2.5) / p>

  $ lshw | grep产品| head -n 1 
产品:Intel(R)Pentium(R)M处理器1.60GHz
$ python --version
Python 2.5
$ time python test_obj.py&& ;时间python test_dict.py&&&时间python test_slots.py

real 0m27.398s(使用'normal'对象)
real 0m16.747s(using __dict__)
real 0m11.777s(using __slots__)

使用CPython 2.6.2,包括命名的元组测试:

  $ python --version 
Python 2.6.2
$ time python test_obj.py&&时间python test_dict.py&&&时间python test_slots.py&&& time python test_namedtuple.py

real 0m27.197s(using'normal'object)
real 0m17.657s(using __dict__)
real 0m12.249s(using __slots__)
real 0m12.262s(using namedtuple)

所以是(真的不是一个惊喜),使用 __ slots __ 是一个性能优化。使用命名元组具有与 __ slots __ 相似的性能。


What is more efficient in Python in terms of memory usage and CPU consumption - Dictionary or Object?

Background: I have to load huge amount of data into Python. I created an object that is just a field container. Creating 4M instances and putting them into a dictionary took about 10 minutes and ~6GB of memory. After dictionary is ready, accessing it is a blink of an eye.

Example: To check the performance I wrote two simple programs that do the same - one is using objects, other dictionary:

Object (execution time ~18sec):

class Obj(object):
  def __init__(self, i):
    self.i = i
    self.l = []
all = {}
for i in range(1000000):
  all[i] = Obj(i)

Dictionary (execution time ~12sec):

all = {}
for i in range(1000000):
  o = {}
  o['i'] = i
  o['l'] = []
  all[i] = o

Question: Am I doing something wrong or dictionary is just faster than object? If indeed dictionary performs better, can somebody explain why?

解决方案

Have you tried using __slots__?

From the documentation http://docs.python.org/reference/datamodel.html#slots:

"By default, instances of both old and new-style classes have a dictionary for attribute storage. This wastes space for objects having very few instance variables. The space consumption can become acute when creating large numbers of instances.

The default can be overridden by defining __slots__ in a new-style class definition. The __slots__ declaration takes a sequence of instance variables and reserves just enough space in each instance to hold a value for each variable. Space is saved because __dict__ is not created for each instance."

So does this save time as well as memory?

Comparing the three approaches on my computer:

test_slots.py:

class Obj(object):
  __slots__ = ('i', 'l')
  def __init__(self, i):
    self.i = i
    self.l = []
all = {}
for i in range(1000000):
  all[i] = Obj(i)

test_obj.py:

class Obj(object):
  def __init__(self, i):
    self.i = i
    self.l = []
all = {}
for i in range(1000000):
  all[i] = Obj(i)

test_dict.py:

all = {}
for i in range(1000000):
  o = {}
  o['i'] = i
  o['l'] = []
  all[i] = o

test_namedtuple.py (supported in 2.6):

import collections

Obj = collections.namedtuple('Obj', 'i l')

all = {}
for i in range(1000000):
  all[i] = Obj(i, [])

Run benchmark (using CPython 2.5):

$ lshw | grep product | head -n 1
          product: Intel(R) Pentium(R) M processor 1.60GHz
$ python --version
Python 2.5
$ time python test_obj.py && time python test_dict.py && time python test_slots.py 

real    0m27.398s (using 'normal' object)
real    0m16.747s (using __dict__)
real    0m11.777s (using __slots__)

Using CPython 2.6.2, including the named tuple test:

$ python --version
Python 2.6.2
$ time python test_obj.py && time python test_dict.py && time python test_slots.py && time python test_namedtuple.py 

real    0m27.197s (using 'normal' object)
real    0m17.657s (using __dict__)
real    0m12.249s (using __slots__)
real    0m12.262s (using namedtuple)

So yes (not really a surprise), using __slots__ is a performance optimization. Using a named tuple has similar performance to __slots__.

这篇关于字典vs对象 - 哪个更有效率,为什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆