迭代的优点是什么? [英] What is the advantage of iteritems?

查看:171
本文介绍了迭代的优点是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Python 2.7.5 @ Mac OS X 10.9.3与8GB内存和1.7GHz Core i5。我已经测试了以下时间消耗。

  d = {i:i * 2 for x in xrange(10 ** 7 * 3)} #WARNING:需要时间并消耗大量的RAM 

%d中的k时间:k,d [k]
CPU时间:用户6.22 s,sys:10.1 ms,总数:6.23 s
挂墙时间:6.23 s

%k,v在d.iteritems()中的时间:k,v
CPU时间:用户7.67秒, sys:27.1 ms,总数:7.7 s
挂墙时间:7.69 s

似乎iteritems比较慢
我想知道迭代超过直接访问dict的优势是什么。



更新:
为更准确的时间配置文件

 在[23]中:%timeit -n 5 for d in d:v = d [k] 
5循环,最好的3:2.32 s每循环

在[24]中:%timeit -n 5为k,v在d.iteritems()中:v
5循环,最好为3:2.33 s循环


解决方案

为了回答你的问题,我们应该先挖一些信息关于如何和何时将 iteritems()添加到API中。



iteritems() 方法
已添加到Python2中。 2引入迭代器 generators 的语言(另见:
Python:dict.items()和dict.iteritems()?之间有什么区别?事实上,这个方法在PEP234中被明确提到。所以它被引入了一个懒惰的替代方案,已经存在的 items()



这跟 file.xreadlines() file.readlines() 在Python 2.1中引入(并且已经在python2.3中被弃用)。



在python 2.3中,添加了 itertools 模块介绍懒惰对手 地图 过滤器



换句话说,当时有(仍然存在)强烈的懒惰趋势的操作。其中一个原因是提高记忆效率。另一个是避免不必要的计算。



我找不到任何引用来提高循环字典的引用。它只是用于替换实际上不需要返回列表的 items()的调用。请注意,这包括更多的用例,而不仅仅是循环的简单



例如在代码中: / p>

  function(dictionary.iteritems())

你不能简单地使用循环来替换 iteritems()你的例子您必须编写一个函数(或使用genexp,即使在 iteritems()被引入时它们不可用,并且它们不会是DRY .. 。)



dict 中检索项目经常进行,所以提供一个内置的在方法中,实际上有一个: items() items()的问题是:




  • 它不懒惰,意味着在一个大的 dict 上调用它可能需要很长时间

  • 它需要很多内存。如果在非常大的 dict 中调用包含大多数被处理对象的内存使用量,几乎可以增加一倍的内存使用量。

  • 时间只被迭代一次



所以,当引入迭代器和生成器时,显而易见的是添加一个懒惰的对应物。如果您需要一个项目列表,因为您要对其进行索引或迭代多次,请使用 items(),否则您可以使用 iteritems )并避免上述问题。



使用 iteritems()与使用 items()而不是手动获取值相同:




  • 你写更少的代码,这使得它更干,减少错误的机会

  • 代码更易读。



加上懒惰的优点。






正如我已经说过的,我无法重现你的表现结果。在我的机器上 iteritems()总是比通过键盘查询更快。差异是相当微不足道的,这可能是由于操作系统一般处理缓存和内存的原因。换句话说,你对效率的论证不是反对(也不是亲)使用一个或另一个替代方案的强有力的论据。



给定平均的性能,使用最可读和简洁的选择: iteritems()。这个讨论将类似于询问为什么在通过索引以相同的性能循环的时候使用foreach?。 foreach的重要性不在于您迭代速度更快,而且您避免编写锅炉代码并提高可读性。






我想指出,在python3中实际删除了 iteritems()。这是这个版本的清理的一部分。 Python3 items() 方法id(大部分)相当于Python2的 viewitems() 方法(实际上是一个backport,如果我没有错误...)。



此版本是懒惰的(因此提供了 iteritems()),还有其他功能,例如提供set-like操作(例如在 dict 之间查找常用项目s)在python3中,使用 items()而不是手动检索值的原因更为引人注目。


I am using Python 2.7.5 @ Mac OS X 10.9.3 with 8GB memory and 1.7GHz Core i5. I have tested time consumption as below.

d = {i:i*2 for i in xrange(10**7*3)} #WARNING: it takes time and consumes a lot of RAM

%time for k in d: k,d[k]
CPU times: user 6.22 s, sys: 10.1 ms, total: 6.23 s
Wall time: 6.23 s

%time for k,v in d.iteritems(): k, v
CPU times: user 7.67 s, sys: 27.1 ms, total: 7.7 s
Wall time: 7.69 s

It seems iteritems is slower. I am wondering what is the advantage of iteritems over directly accessing the dict.

Update: for a more accuracy time profile

In [23]: %timeit -n 5 for k in d: v=d[k]
5 loops, best of 3: 2.32 s per loop

In [24]: %timeit -n 5 for k,v in d.iteritems(): v
5 loops, best of 3: 2.33 s per loop

解决方案

To answer your question we should first dig some information about how and when iteritems() was added to the API.

The iteritems() method was added in Python2.2 following the introduction of iterators and generators in the language (see also: Python: What is the difference between dict.items() and dict.iteritems()?). In fact the method is explicitly mentioned in PEP 234. So it was introduced as a lazy alternative to the already present items().

This followed the same pattern as file.xreadlines() versus file.readlines() which was introduced in Python 2.1 (and already deprecated in python2.3 by the way).

In python 2.3 the itertools module was added which introduced lazy counterparts to map, filter etc.

In other words, at the time there was (and still there is) a strong trend towards lazyness of operations. One of the reasons is to improve memory efficiency. An other one is to avoid unneeded computation.

I cannot find any reference that says that it was introduced to improve the speed of looping over the dictionary. It was simply used to replace calls to items() that didn't actually have to return a list. Note that this include more use-cases than just a simple for loop.

For example in the code:

function(dictionary.iteritems())

you cannot simply use a for loop to replace iteritems() as in your example. You'd have to write a function (or use a genexp, even though they weren't available when iteritems() was introduced, and they wouldn't be DRY...).

Retrieving the items from a dict is done pretty often so it does make sense to provide a built-in method and, in fact, there was one: items(). The problem with items() is that:

  • it isn't lazy, meaning that calling it on a big dict can take quite some time
  • it takes a lot of memory. It can almost double the memory usage of a program if called on a very big dict that contains most objects being manipulated
  • Most of the time it is iterated only once

So, when introducing iterators and generators, it was obvious to just add a lazy counterpart. If you need a list of items because you want to index it or iterate more than once, use items(), otherwise you can just use iteritems() and avoid the problems cited above.

The advantages of using iteritems() are the same as using items() versus manually getting the value:

  • You write less code, which makes it more DRY and reduces the chances of errors
  • Code is more readable.

Plus the advantages of lazyness.


As I already stated I cannot reproduce your performance results. On my machine iteritems() is always faster than iterating + looking up by key. The difference is quite negligible anyway, and it's probably due to how the OS is handling caching and memory in general. In otherwords your argument about efficiency isn't a strong argument against (nor pro) using one or the other alternative.

Given equal performances on average, use the most readable and concise alternative: iteritems(). This discussion would be similar to asking "why use a foreach when you can just loop by index with the same performance?". The importance of foreach isn't in the fact that you iterate faster but that you avoid writing boiler-plate code and improve readability.


I'd like to point out that iteritems() was in fact removed in python3. This was part of the "cleanup" of this version. Python3 items() method id (mostly) equivalent to Python2's viewitems() method (actually a backport if I'm not mistaken...).

This version is lazy (and thus provides a replacement for iteritems()) and has also further functionality, such as providing "set-like" operations (such as finding common items between dicts in an efficient way etc.) So in python3 the reasons to use items() instead of manually retrieving the values are even more compelling.

这篇关于迭代的优点是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆