懒惰评估地图 [英] Lazy evaluation of map

查看:73
本文介绍了懒惰评估地图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近读到Python 3中map的一个好处是它很懒.也就是说,最好这样做

map(lambda x: x**2, range(10**100))

而不是

[x**2 for x in range(10**100)]

我很好奇的是如何使用这种懒惰.如果生成映射对象,例如,如何访问生成的操作/列表中的特定元素.在我所见过的map的几乎所有文档中,它们都会做类似print(map(...))for i in map(...)的事情(据我所知),它们放弃了惰性概念,因为它隐式地将地图转换为列表.

我想我正在寻找的是能够以与range类似的懒惰方式使用地图对象的能力,其中我可以做x = range(10**100)并懒惰地生成x[10000]而无需巨大的计算量.

如果这个概念不存在,那么懒惰map有什么好处?如果您始终需要将其转换为列表等非惰性对象,为什么map是惰性的呢?

解决方案

您在这里将苹果与橙子进行比较. range不是不是,只是一个懒惰的迭代.它是一个特定的对象,其内容满足特定的法律,从而可以支持许多操作,而无需在内存中实际构建巨大的序列.这是因为range的第n个元素基本上只是start + n*step(取模stop,符号等)

map必需的,可以与 any 函数f一起使用.特别地,函数可能具有共享/全局状态,这已经使不执行100次函数调用就无法执行map(f, something)[100]的任何可能性.不这样做会破坏结果的正确性.

map是惰性的,只是意味着它不会立即构建完整的结果列表,而是等待您要求下一个结果,然后再调用f并生成它.这样可以避免在代码中构建不必要的列表,例如:

for x in map(f, iterable):
    # do something with x

如果map渴望,它将消耗iterable的两倍的内存来进行循环,而懒惰的map基本上只需要x的空间即可.

此外,它允许在无限可迭代对象()上调用map,例如count().显然,这会导致程序永无止境地执行某些操作,或者在某些时候您可以停止查看map.渴望的map无法处理这种情况.

如果您想使用仅适用于纯功能且允许随机访问的受限映射,则可以编写自己的类:

class PureMap:
    def __init__(self, function, sequence):
        self._f = function
        self._sequence = sequence

    def __iter__(self):
        return map(self._f, self._sequence)
    def __getitem__(self, i):
        return self._f(self._sequence[i])
    # etc.

但是即使在这种情况下,您仍然有一些问题:

  1. 如果sequence实际上是iterable以获得第n个元素,则必须消耗前n个元素.之后,您必须将它们作为序列存储在类中,以备将来使用.但这已经违反了整个目的,因为执行PureMap(f, sequence)[1000]要求您仍然将1000元素存储在内存中,即使可以避免999调用f.

  2. 您要避免在同一元素上多次调用f.这意味着您还必须跟踪已计算出哪个元素,而未计算出哪个元素.

唯一可以实现所需目标的情况是:

  • 所调用的函数是纯函数
  • 可迭代的参数类似于range,它允许随机访问而不必产生其他元素
  • 您调用的函数很快,因此您可以在各种元素上重新计算它,而不必太担心性能.

当所有这些假设都满足时,您可以拥有一个像range一样工作"的地图对象.

I recently read that one benefit of map in Python 3 was that it is lazy. That means, it is better to do

map(lambda x: x**2, range(10**100))

rather than

[x**2 for x in range(10**100)]

What I'm curious about, is how I can put this laziness to use. If I generate the map object, how can I, for example, access a particular element in the resulting operation/list. In almost every documentation on map I've seen, they will do something like print(map(...)) or for i in map(...) which (as far as I understand it) relinquishes the lazy concept as it implicitly converts the map to a list.

I suppose what I'm looking for is the ability to use map objects in a similarly lazy fashion as range where I can do x = range(10**100) and lazily generate x[10000] without huge computational loads.

If this concept doesn't exist, what is the benefit then of having map be lazy? If you always need to convert it to some non-lazy object like a list, why does it matter that map is lazy?

解决方案

You are comparing apples to oranges here. range is not just a lazy iterable. It is a specific object whose contents satisfy specific laws that allow to support many operations without actually building a huge sequence in memory. That's because the nth element of range is basically just start + n*step (modulo stop, signs etc.)

However map is meant to work with any function f. In particular functions may have shared/global state which already defeats any chance of being able to do map(f, something)[100] without performing 100 function calls. Not doing so breaks the correctness of the result.

map is lazy simply means it doesn't immediately build a complete list of results but waits for you to require the next result before doing the call to f and produce it. This avoid building unneccessary lists in code like:

for x in map(f, iterable):
    # do something with x

where if map was eager it would consume twice the memory of iterable to do the loop, with a lazy map the only space required is that of x basically.

Moreover it allows to call map on infinite iterables like count(). This obviously result in a never ending program doing something, or at some point you can just stop looking into map. An eager map cannot handle this case.

If you want to use your own restricted map that works only on pure fuctions and that allow random access you could write your own class:

class PureMap:
    def __init__(self, function, sequence):
        self._f = function
        self._sequence = sequence

    def __iter__(self):
        return map(self._f, self._sequence)
    def __getitem__(self, i):
        return self._f(self._sequence[i])
    # etc.

However even in this case you have some problems:

  1. If sequence is actually an iterable to obtain the nth element you have to consume the first n elements. After that you'd have to store them as a sequence in your class for future use. But this already defeats the purpose of the whole thing since doing PureMap(f, sequence)[1000] requires you to store 1000 elements in memory anyway, even though it avoids 999 calls to f.

  2. You want to avoid calling f multiple times on the same element. This means you'd also have to keep track of which element was already computed and which not.

The only situation where you could achieve what you want is the following:

  • The function being called is pure
  • The iterable argument is something like range that allows random access without having to produce other elements
  • The function you call is fast so that you can recompute it on the various elements without worrying too much about performance.

When all those assumptions are met you can have a map object that "works like range".

这篇关于懒惰评估地图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆