什么时候不适合使用 python 生成器? [英] When is not a good time to use python generators?

查看:18
本文介绍了什么时候不适合使用 python 生成器?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这与 你可以使用什么 Python 生成器函数相反for?:python 生成器、生成器表达式和 itertools 模块是我最近最喜欢的 Python 功能.它们在设置操作链以对大量数据执行时特别有用——我在处理 DSV 文件时经常使用它们.

This is rather the inverse of What can you use Python generator functions for?: python generators, generator expressions, and the itertools module are some of my favorite features of python these days. They're especially useful when setting up chains of operations to perform on a big pile of data--I often use them when processing DSV files.

那么什么时候是使用生成器、生成器表达式或itertools函数的好时机?

So when is it not a good time to use a generator, or a generator expression, or an itertools function?

  • 什么时候我应该更喜欢 zip() 而不是 itertools.izip(),或者
  • range() 超过 xrange(),或
  • [x for x in foo] over (x for x in foo)?
  • When should I prefer zip() over itertools.izip(), or
  • range() over xrange(), or
  • [x for x in foo] over (x for x in foo)?

显然,我们最终需要将生成器解析"为实际数据,通常是通过创建一个列表或使用非生成器循环对其进行迭代.有时我们只需要知道长度.这不是我要问的.

Obviously, we eventually need to "resolve" a generator into actual data, usually by creating a list or iterating over it with a non-generator loop. Sometimes we just need to know the length. This isn't what I'm asking.

我们使用生成器,这样我们就不会为临时数据分配新列表到内存中.这对于大型数据集尤其有意义.它对小数据集也有意义吗?是否存在明显的内存/CPU 权衡?

We use generators so that we're not assigning new lists into memory for interim data. This especially makes sense for large datasets. Does it make sense for small datasets too? Is there a noticeable memory/cpu trade-off?

鉴于对 列表理解性能对比 map() 和 filter().(替代链接)

推荐答案

在以下情况下使用列表而不是生成器:

1) 您需要多次访问数据(即缓存结果而不是重新计算它们):

1) You need to access the data multiple times (i.e. cache the results instead of recomputing them):

for i in outer:           # used once, okay to be a generator or return a list
    for j in inner:       # used multiple times, reusing a list is better
         ...

2) 您需要随机访问(或除前向顺序以外的任何访问):

2) You need random access (or any access other than forward sequential order):

for i in reversed(data): ...     # generators aren't reversible

s[i], s[j] = s[j], s[i]          # generators aren't indexable

3) 您需要加入字符串(这需要对数据进行两次传递):

3) You need to join strings (which requires two passes over the data):

s = ''.join(data)                # lists are faster than generators in this use case

4) 您使用的 PyPy 有时无法像正常函数调用和列表操作那样优化生成器代码.

4) You are using PyPy which sometimes can't optimize generator code as much as it can with normal function calls and list manipulations.

这篇关于什么时候不适合使用 python 生成器?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆