懒惰评估:为什么它更快,优点与劣势,机制(为什么它使用更少的CPU;例子?)和简单的概念证明例子 [英] Lazy Evaluation: Why is it faster, advantages vs disadvantages, mechanics (why it uses less cpu; examples?) and simple proof of concept examples

查看:87
本文介绍了懒惰评估:为什么它更快,优点与劣势,机制(为什么它使用更少的CPU;例子?)和简单的概念证明例子的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

懒惰评估被认为是一种推迟进程的方式,直到它第一次出现需要。这往往避免重复评估,这就是为什么我会想象,它表现更快。
像Haskell(和JavaScript ..?)这样的函数式语言内置了这个功能。然而,我不明白如何以及为什么其他'正常'的方法(即;相同的功能,但不使用懒惰评估)更慢......为什么和为什么做这些其他方法是否重复评估?有人可以通过给出简单的例子和​​解释每种方法的机制来阐述这一点吗?

另外,根据维基百科页面关于 $ ol <

  • 通过避免不必要的计算来提高性能,并且在评估复合表达式时出错
    条件
  • 构建潜在的无限数据结构的能力

  • 将控制流(结构)定义为抽象
    而不是原语的能力

  • 然而,我们是否可以控制计算需要并避免重复相同的? (1)
    我们可以使用一个链接列表创建一个无限数据结构(2)
    我们可以(3)已经..我们可以定义类/模板/对象,并使用这些来代替基元(即JavaScript)。



    另外,在我看来(至少从我看过的例子),懒惰的评价与递归并且使用'头部'和'尾巴'(以及其他)概念一致。当然,有些情况下递归是有用的,但懒惰评估的东西比以上更多...?不只是一个递归方法来解决问题..? 是JavaScript库,它使用递归以及一些其他简单操作(头部,尾部等)来执行懒惰评估。



    看来我无法理解它......



    感谢任何贡献。

    解决方案

    我将在Python 2.7和Haskell中显示示例。



    <举例来说,你想做一个非常低效的数字,从0到10,000,000。你可以用Python中的for循环来做到这一点:

      total = 0 
    我在范围内(10000000):
    total + = i
    print total

    在我的电脑上, 1.3s执行。相反,我将范围更改为 xrange 范围,懒洋洋地产生一个数字序列),它需要1.2s,只是稍微快一点。但是,如果我检查使用的内存(使用 memory_profiler 包),带有范围的版本使用大约155MB的RAM ,而 xrange 版本只使用1MB的RAM(两个数字都不包括Python使用的〜11MB)。这是一个令人难以置信的戏剧性差异,我们也可以看到这个工具的来源:

     内存使用增量行内容
    ===========================================
    10.875 MiB 0.004 MiB total = 0
    165.926 MiB 155.051 MiB(i)范围内(10000000):
    165.926 MiB 0.000 MiB总额+ = i
    总计

    这就是说,在我们开始之前,我们使用了10.875MB,添加了 total = 0 > 0.004MB,然后为范围内的i(10000000):当它生成整个数字列表 [0..9999999]时添加了155.051MB 。如果我们与 xrange 版本比较:

     内存使用增量行内容
    ===========================================
    11.000 MiB 0.004 MiB total = 0
    11.109 MiB 0.109 MiB for xrange(10000000):
    11.109 MiB 0.000 MiB total + = i
    return total

    因此,我们在xrange(10000000)中为11MB和开头:仅添加0.109MB。只需在代码中添加一个字母即可节省巨大的内存。虽然这个例子相当有意思,但它显示了如何在不需要元素的情况下计算整个列表,可以使内存更加高效。






    Python有迭代器和生成器,当你需要产生数据序列时(虽然没有什么能阻止你将它们用于单个值),但是Haskell在每一个时间都有懒惰语言的价值,甚至用户定义的价值。这可以让你利用诸如数据结构之类的东西,这些东西不会适应内存,而无需针对这个事实编写复杂的方法。典型的例子是斐波那契数列:

      fibs = 1:1:zipWith(+)fibs(tail fibs)

    非常优雅地表达了这个着名的序列,以定义一个递归无限列表来生成所有斐波纳契数字。它的CPU效率很高,因为所有的值都被缓存了,所以每个元素只需要计算一次(与天真的递归实现相比),但是如果计算太多元素,你的计算机最终会耗尽内存因为你现在正在存储这个庞大的数字列表。懒惰编程可以让你提高CPU效率,但不是RAM效率。尽管如此,还是有办法的。如果您要写

      fib :: Int  - >整数
    fib n = let fibs = 1:1:zipWith(+)fibs(尾纤)在fibs中! n

    然后它运行在接近常量的内存中,并且非常快速地执行,但是memoization会丢失随后调用 fib 必须重新计算 fibs



    更复杂的例子可以在这里找到,其中作者展示了如何在Haskell中使用惰性编程和递归来执行数组的动态编程,这是一项最初认为非常困难并且需要进行变异的壮举,但Haskell设法非常容易地实现绑结风格递归。它可以提高CPU和RAM的效率,并且在比我预期的C / C ++更少的线路上执行。






    foldl ),并且必须引入一些严格的规则来提高效率。当你以thunk的形式将一个文件读入一个字符串,关闭文件,然后尝试对该字符串进行操作时,它也会让很多人使用 IO 。只有在文件关闭后,才会评估thunk,导致发生IO错误并使程序崩溃。与其他任何事情一样,懒惰编程并非没有缺陷,陷阱和陷阱。需要时间来学习如何使用它,并知道它的局限性。





    1)通过天真的递归实现,我的意思是实施斐波纳契序列作为

      fib :: Integer  - >整数
    fib 0 = 1
    fib 1 = 1
    fib n = fib(n-1)+ fib(n-2)
    pre>

    通过这个实现,您可以非常清楚地看到数学定义,它非常符合归纳证明的风格,您可以显示基本案例,然后显示一般情况。但是,如果我调用 fib 5 ,它将展开到类似于

      fib 5 = fib 4 + fib 3 
    = fib 3 + fib 2 + fib 2 + fib 1
    = fib 2 + fib 1 + fib 1 + fib 0 + fib 1 + fib 0 + fib 1
    = fib 1 + fib 0 + fib 1 + fib 1 + fib 0 + fib 1 + fib 0 + fib 1
    = 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1
    = 8

    相反,我们想分享一些计算结果, fib 3 只计算一次, fib 2 只计算一次,等等。



    通过在Haskell中使用递归定义列表,我们可以避免这种情况。在内部,这个列表代表如下所示:

      fibs = 1:1:zipWith(+)fibs(tail fibs)
    = 1:1:zipWith(+)(f1:f2:fs)(f2:fs)
    ^ -------------------- ^ ^ ^
    ^ ------------------- | ------- |
    = 1:1:2:zipWith(+)(f2:f3:fs)(f3:fs)
    ^ ------------------ - ^ ^ ^
    ^ ------------------- | ------- |
    = 1:1:2:3:zipWith(+)(f3:f4:fs)(f4:fs)
    ^ ---------------- ---- ^ ^ ^
    ^ ------------------- | ------- |

    所以希望你可以在这里看到形成的模式,因为列表是建立的,它保持指针回到为了计算下一个元素而生成的最后两个元素。这意味着对于计算的第n个元素,执行 n-2 个添加。即使对于天真的 fib 5 ,你也可以看到有更多的加法执行,并且加法的数量将继续呈指数增长。这个定义是通过懒惰和递归实现的,让我们把一个 O(2 ^ n)算法变成一个 O(n)算法,但我们必须放弃RAM才能这样做。如果这是在顶层定义的,那么值会在程序的整个生命周期内被缓存。这意味着如果你需要重复引用第1000个元素,你不必重新计算它,只需要索引它。



    另一方面,定义

      fib :: Int  - >整数
    fib n =
    let fibs = 1:1:zipWith(+)fibs(tail fibs)
    in fibs! n

    使用本地副本 fibs 时间 fib 被调用。我们在调用 fib 之间没有获得缓存,但是我们获得了本地缓存,使得我们的复杂性 O(n)。另外,GHC足够聪明,知道在我们使用它来计算下一个元素之后,我们不必保留列表的开始,因此我们遍历 fibs 寻找 n th元素,它只需要保持2-3个元素,而thunk指向下一个元素。这样可以在计算时节省我们的RAM,并且由于它没有在全局层次上定义,所以在程序的整个生命周期内它不会消耗RAM。这是我们想要花费内存和CPU周期之间的一种折衷,而不同的方法在不同情况下更好。这些技术通常适用于大部分Haskell编程,而不仅仅是针对这个序列!


    Lazy evaluation is said to be a way of delaying a process until the first time it is needed. This tends to avoid repeated evaluations and thats why I would imagine that is performing a lot faster. Functional language like Haskell (and JavaScript..?) have this functionality built-in.

    However, I don't understand how and why other 'normal' approaches (that is; same functionality but not using lazy evaluation) are slower.. how and why do these other approaches do repeated evaluations? Can someone elaborate on this by giving simple examples and explaining the mechanics of each approach?

    Also, according to Wikipedia page about lazy evaluation these are said to be the advantages of this approach:

    1. Performance increases by avoiding needless calculations, and error conditions in evaluating compound expressions
    2. The ability to construct potentially infinite data structures
    3. The ability to define control flow (structures) as abstractions instead of primitives

    However, can we just control the calculations needed and avoid repeating the same ones? (1) We can use i.e. a Linked List to create an infinite data structure (2) Can we do (3) already..??? We can define classes/templates/objects and use those instead of primitives (i.e JavaScript).

    Additionally, it seems to me that (at least from the cases i have seen), lazy evaluation goes hand-to-hand with recursion and using the 'head' and 'tail' (along with others) notions. Surely, there are cases where recursion is useful but is lazy evaluation something more than that...? more than a recursive approach to solving a problem..? Streamjs is JavaScript library that uses recursion along with some other simple operations (head,tail,etc) to perform lazy evaluation.

    It seems i can't get my head around it...

    Thanks in advance for any contribution.

    解决方案

    I'll show examples in both Python 2.7 and Haskell.

    Say, for example, you wanted to do a really inefficient sum of all the numbers from 0 to 10,000,000. You could do this with a for loop in Python as

    total = 0
    for i in range(10000000):
        total += i
    print total
    

    On my computer, this takes about 1.3s to execute. If instead, I changed range to xrange (the generator form of range, lazily produces a sequence of numbers), it takes 1.2s, only slightly faster. However, if I check the memory used (using the memory_profiler package), the version with range uses about 155MB of RAM, while the xrange version uses only 1MB of RAM (both numbers not including the ~11MB Python uses). This is an incredibly dramatic difference, and we can see where it comes from with this tool as well:

    Mem usage     Increment       Line Contents
    ===========================================
    10.875 MiB    0.004 MiB       total = 0
    165.926 MiB  155.051 MiB       for i in range(10000000):
    165.926 MiB    0.000 MiB           total += i
                                   return total
    

    This says that before we started we were using 10.875MB, total = 0 added 0.004MB, and then for i in range(10000000): added 155.051MB when it generated the entire list of numbers [0..9999999]. If we compare to the xrange version:

    Mem usage     Increment       Line Contents
    ===========================================
    11.000 MiB    0.004 MiB       total = 0
    11.109 MiB    0.109 MiB       for i in xrange(10000000):
    11.109 MiB    0.000 MiB           total += i
                                  return total
    

    So we started with 11MB and for i in xrange(10000000): added only 0.109MB. This is a huge memory savings by only adding a single letter to the code. While this example is fairly contrived, it shows how not computing a whole list until the element is needed can make things a lot more memory efficient.


    Python has iterators and generators which act as a sort of "lazy" programming for when you need to yield sequences of data (although there's nothing stopping you from using them for single values), but Haskell has laziness built into every value in the language, even user-defined ones. This lets you take advantage of things like data structures that won't fit in memory without having to program complicated ways around that fact. The canonical example would be the fibonacci sequence:

    fibs = 1 : 1 : zipWith (+) fibs (tail fibs)
    

    which very elegantly expresses this famous sequence to define a recursive infinite list generating all fibonacci numbers. It's CPU efficient because all values are cached, so each element only has to be computed once (compared to a naive recursive implementation)1, but if you calculate too many elements your computer will eventually run out of RAM because you're now storing this huge list of numbers. This is an example where lazy programming lets you have CPU efficiency, but not RAM efficiency. There is a way around this, though. If you were to write

    fib :: Int -> Integer
    fib n = let fibs = 1 : 1 : zipWith (+) fibs (tail fibs) in fibs !! n
    

    then this runs in near-constant memory, and does so very quickly, but memoization is lost as subsequent calls to fib have to recompute fibs.

    A more complex example can be found here, where the author shows how to use lazy programming and recursion in Haskell to perform dynamic programming with arrays, a feat that most initially think is very difficult and requires mutation, but Haskell manages to do very easily with "tying the knot" style recursion. It results in both CPU and RAM efficiency, and does so in fewer lines than I'd expect in C/C++.


    All this being said, there are plenty of cases where lazy programming is annoying. Often you can build up huge numbers of thunks instead of computing things as you go (I'm looking at you, foldl), and some strictness has to be introduced to attain efficiency. It also bites a lot of people with IO, when you read a file to a string as a thunk, close the file, and then try to operate on that string. It's only after the file is closed that the thunk gets evaluated, causing an IO error to occur and crashes your program. As with anything, lazy programming is not without its flaws, gotchas, and pitfalls. It takes time to learn how to work with it well, and to know what its limitations are.


    1) By "naive recursive implementation", I mean implementing the fibonacci sequence as

    fib :: Integer -> Integer
    fib 0 = 1
    fib 1 = 1
    fib n = fib (n-1) + fib (n-2)
    

    With this implementation, you can see the mathematical definition very clearly, it's very much in the style of inductive proofs, you show your base cases and then the general case. However, if I call fib 5, this will "expand" into something like

    fib 5 = fib 4 + fib 3
          = fib 3 + fib 2 + fib 2 + fib 1
          = fib 2 + fib 1 + fib 1 + fib 0 + fib 1 + fib 0 + fib 1
          = fib 1 + fib 0 + fib 1 + fib 1 + fib 0 + fib 1 + fib 0 + fib 1
          = 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1
          = 8
    

    When instead we'd like to share some of those computations, that way fib 3 only gets computed once, fib 2 only gets computed once, etc.

    By using a recursively defined list in Haskell, we can avoid this. Internally, this list is represented something like this:

    fibs = 1 : 1 : zipWith (+) fibs (tail fibs)
         = 1 : 1 : zipWith (+) (f1:f2:fs) (f2:fs)
           ^--------------------^  ^       ^
               ^-------------------|-------|
         = 1 : 1 : 2 : zipWith (+) (f2:f3:fs) (f3:fs)
               ^--------------------^  ^       ^
                   ^-------------------|-------|
         = 1 : 1 : 2 : 3 : zipWith (+) (f3:f4:fs) (f4:fs)
                   ^--------------------^  ^       ^
                       ^-------------------|-------|
    

    So hopefully you can see the pattern forming here, as the list is build, it keeps pointers back to the last two elements generated in order to compute the next element. This means that for the nth element computed, there are n-2 additions performed. Even for the naive fib 5, you can see that there are more additions performed than that, and the number of additions will continue to grow exponentially. This definition is made possible through laziness and recursions, letting us turn an O(2^n) algorithm into an O(n) algorithm, but we have to give up RAM to do so. If this is defined at the top level, then values are cached for the lifetime of the program. It does mean that if you need to refer to the 1000th element repeatedly, you don't have to recompute it, just index it.

    On the other hand, the definition

    fib :: Int -> Integer
    fib n =
        let fibs = 1 : 1 : zipWith (+) fibs (tail fibs)
        in fibs !! n
    

    uses a local copy of fibs every time fib is called. We don't get caching between calls to fib, but we do get local caching, leaving our complexity O(n). Additionally, GHC is smart enough to know that we don't have to keep the beginning of the list around after we've used it to calculate the next element, so as we traverse fibs looking for the nth element, it only needs to hold on to 2-3 elements and a thunk pointing at the next element. This saves us RAM while computing it, and since it isn't defined at a global level it doesn't eat up RAM over the lifetime of the program. It's a tradeoff between when we want to spend RAM and CPU cycles, and different approaches are better for different situations. These techniques are applicable to much of Haskell programming in general, not just for this sequence!

    这篇关于懒惰评估:为什么它更快,优点与劣势,机制(为什么它使用更少的CPU;例子?)和简单的概念证明例子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆