如何使用itertools模块构建“矢量化"构建基块? [英] How to build “vectorized” building blocks using itertools module?

查看:58
本文介绍了如何使用itertools模块构建“矢量化"构建基块?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

itertools文档的食谱部分始于此文本:

扩展工具提供与基础工具相同的高性能 工具集.通过处理保持卓越的内存性能 一次只包含一个元素,而不是将整个可迭代对象放入 一次全部存储.通过链接工具使代码量保持较小 以实用的样式一起使用,有助于消除暂时性 变量. 通过保留矢量化"建筑来保持高速 阻止使用for循环和生成器 解释器的开销.

问题是,应如何构造生成器以避免开销?您能否提供一些具有这种开销的构造不良的块的示例?

我决定问我何时回答这问题在哪里我不能确切地说chain(sequence, [obj])是否比chain(sequence, repeat(obj,1))有开销,我是否应该选择后者?

解决方案

文档文本不是关于如何构造生成器以避免开销的.它说明了正确编写的itertools-使用代码(例如示例中提供的代码)完全避免了for循环和生成器,而将其留给itertools或内置收集器(例如list)以使用迭代器. /p>

例如,以tabulate为例:

def tabulate(function, start=0):
    "Return function(0), function(1), ..."
    return imap(function, count(start))

写此代码的非矢量化方式应该是:

def tabulate(function, start=0):
    i = start
    while True:
        yield function(i)
        i += 1

此版本导致解释器开销",因为循环和函数调用是在Python中完成的.

关于链接单个元素,可以安全地假设chain(sequence, [obj])将会(平凡)更快,因为使用专用语法和操作码在Python中对定长列表的构造进行了优化.同样,chain(sequence, (obj,))甚至会更快,因为元组共享列表的优化,并且引导起来更小.与基准测试一样,用python -m timeit进行度量要比猜测更好.

文档引用并非本身与迭代器创建方面的差异有关,例如选择repeat(foo, 1)[foo]之一所显示的差异.由于迭代器只产生一个元素,因此它的使用方式没有任何区别.在处理可能产生数百万个元素的迭代器时,文档谈到了处理器和内存的效率.与此相比,选择更快地创建 的迭代器是微不足道的,因为可以随时更改创建策略.另一方面,一旦代码被设计为使用不进行向量化的显式循环,则在以后不进行完全重写的情况下很难更改.

The recipe section of itertools docs begins with this text:

The extended tools offer the same high performance as the underlying toolset. The superior memory performance is kept by processing elements one at a time rather than bringing the whole iterable into memory all at once. Code volume is kept small by linking the tools together in a functional style which helps eliminate temporary variables. High speed is retained by preferring "vectorized" building blocks over the use of for-loops and generators which incur interpreter overhead.

The question is, how should generators be constructed to avoid the overhead? Could you provide some examples of poor-constructed blocks which have this overhead?

I decided to ask when I was answering this question where I couldn't say exactly if chain(sequence, [obj]) has overhead over chain(sequence, repeat(obj,1)), and should I prefer the latter.

解决方案

The doc text is not about how to construct generators to avoid the overhead. It explains that correctly written itertools-using code, such as code provided in the examples, eschews for-loops and generators altogether, leaving it to itertools or built-in collectors (such as list) to consume the iterators.

Take, for example, the tabulate example:

def tabulate(function, start=0):
    "Return function(0), function(1), ..."
    return imap(function, count(start))

The non-vectorized way to write this would have been:

def tabulate(function, start=0):
    i = start
    while True:
        yield function(i)
        i += 1

This version "incurs interpreter overhead" because the looping and the function call is done in Python.

Regarding chaining a single element, it is safe to assume that chain(sequence, [obj]) will be (trivially) faster because fixed-length list construction is well-optimized in Python using specialized syntax and op-codes. In the same vein, chain(sequence, (obj,)) would be even faster, since tuples share the optimizations for lists, and are smaller to boot. As always with benchmarking, it is much better to measure with python -m timeit than to guess.

The documentation quote does not concern itself with differences in iterator creation, such as those exhibited by choosing one of repeat(foo, 1) and [foo]. Since the iterator only ever produces a single element, it makes no difference how it is consumed. The docs speak of processor and memory efficiency when dealing with iterators that can produce millions of elements. Compared to that, choosing the iterator that creates faster is trivial because creation strategy can be changed at any time. On the other hand, once the code is designed to use explicit loops that don't vectorize, it can be very hard to change later without a full rewrite.

这篇关于如何使用itertools模块构建“矢量化"构建基块?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆