懒惰评估地图 [英] Lazy evaluation of map
问题描述
我最近读到Python 3中map
的一个好处是它很懒.也就是说,最好这样做
map(lambda x: x**2, range(10**100))
而不是
[x**2 for x in range(10**100)]
我很好奇的是如何使用这种懒惰.如果生成映射对象,例如,如何访问生成的操作/列表中的特定元素.在我所见过的map
的几乎所有文档中,它们都会做类似print(map(...))
或for i in map(...)
的事情(据我所知),它们放弃了惰性概念,因为它隐式地将地图转换为列表.>
我想我正在寻找的是能够以与range
类似的懒惰方式使用地图对象的能力,其中我可以做x = range(10**100)
并懒惰地生成x[10000]
而无需巨大的计算量.
如果这个概念不存在,那么懒惰map
有什么好处?如果您始终需要将其转换为列表等非惰性对象,为什么map
是惰性的呢?
您在这里将苹果与橙子进行比较. range
不是不是,只是一个懒惰的迭代.它是一个特定的对象,其内容满足特定的法律,从而可以支持许多操作,而无需在内存中实际构建巨大的序列.这是因为range
的第n个元素基本上只是start + n*step
(取模stop
,符号等)
map
是必需的,可以与 any 函数f
一起使用.特别地,函数可能具有共享/全局状态,这已经使不执行100次函数调用就无法执行map(f, something)[100]
的任何可能性.不这样做会破坏结果的正确性.
map
是惰性的,只是意味着它不会立即构建完整的结果列表,而是等待您要求下一个结果,然后再调用f
并生成它.这样可以避免在代码中构建不必要的列表,例如:
for x in map(f, iterable):
# do something with x
如果map
渴望,它将消耗iterable
的两倍的内存来进行循环,而懒惰的map
基本上只需要x
的空间即可.
此外,它允许在无限可迭代对象()上调用map
,例如count()
.显然,这会导致程序永无止境地执行某些操作,或者在某些时候您可以停止查看map
.渴望的map
无法处理这种情况.
如果您想使用仅适用于纯功能且允许随机访问的受限映射,则可以编写自己的类:
class PureMap:
def __init__(self, function, sequence):
self._f = function
self._sequence = sequence
def __iter__(self):
return map(self._f, self._sequence)
def __getitem__(self, i):
return self._f(self._sequence[i])
# etc.
但是即使在这种情况下,您仍然有一些问题:
-
如果
sequence
实际上是iterable
以获得第n个元素,则必须消耗前n个元素.之后,您必须将它们作为序列存储在类中,以备将来使用.但这已经违反了整个目的,因为执行PureMap(f, sequence)[1000]
要求您仍然将1000
元素存储在内存中,即使可以避免999
调用f
. -
您要避免在同一元素上多次调用
f
.这意味着您还必须跟踪已计算出哪个元素,而未计算出哪个元素.
唯一可以实现所需目标的情况是:
- 所调用的函数是纯函数
- 可迭代的参数类似于
range
,它允许随机访问而不必产生其他元素 - 您调用的函数很快,因此您可以在各种元素上重新计算它,而不必太担心性能.
当所有这些假设都满足时,您可以拥有一个像range
一样工作"的地图对象.
I recently read that one benefit of map
in Python 3 was that it is lazy. That means, it is better to do
map(lambda x: x**2, range(10**100))
rather than
[x**2 for x in range(10**100)]
What I'm curious about, is how I can put this laziness to use. If I generate the map object, how can I, for example, access a particular element in the resulting operation/list. In almost every documentation on map
I've seen, they will do something like print(map(...))
or for i in map(...)
which (as far as I understand it) relinquishes the lazy concept as it implicitly converts the map to a list.
I suppose what I'm looking for is the ability to use map objects in a similarly lazy fashion as range
where I can do x = range(10**100)
and lazily generate x[10000]
without huge computational loads.
If this concept doesn't exist, what is the benefit then of having map
be lazy? If you always need to convert it to some non-lazy object like a list, why does it matter that map
is lazy?
You are comparing apples to oranges here. range
is not just a lazy iterable. It is a specific object whose contents satisfy specific laws that allow to support many operations without actually building a huge sequence in memory. That's because the nth element of range
is basically just start + n*step
(modulo stop
, signs etc.)
However map
is meant to work with any function f
. In particular functions may have shared/global state which already defeats any chance of being able to do map(f, something)[100]
without performing 100 function calls. Not doing so breaks the correctness of the result.
map
is lazy simply means it doesn't immediately build a complete list of results but waits for you to require the next result before doing the call to f
and produce it. This avoid building unneccessary lists in code like:
for x in map(f, iterable):
# do something with x
where if map
was eager it would consume twice the memory of iterable
to do the loop, with a lazy map
the only space required is that of x
basically.
Moreover it allows to call map
on infinite iterables like count()
. This obviously result in a never ending program doing something, or at some point you can just stop looking into map
. An eager map
cannot handle this case.
If you want to use your own restricted map that works only on pure fuctions and that allow random access you could write your own class:
class PureMap:
def __init__(self, function, sequence):
self._f = function
self._sequence = sequence
def __iter__(self):
return map(self._f, self._sequence)
def __getitem__(self, i):
return self._f(self._sequence[i])
# etc.
However even in this case you have some problems:
If
sequence
is actually aniterable
to obtain the nth element you have to consume the first n elements. After that you'd have to store them as a sequence in your class for future use. But this already defeats the purpose of the whole thing since doingPureMap(f, sequence)[1000]
requires you to store1000
elements in memory anyway, even though it avoids999
calls tof
.You want to avoid calling
f
multiple times on the same element. This means you'd also have to keep track of which element was already computed and which not.
The only situation where you could achieve what you want is the following:
- The function being called is pure
- The iterable argument is something like
range
that allows random access without having to produce other elements - The function you call is fast so that you can recompute it on the various elements without worrying too much about performance.
When all those assumptions are met you can have a map object that "works like range
".
这篇关于懒惰评估地图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!