如何判断Haskell是否会缓存结果或重新计算结果? [英] How to tell whether Haskell will cache a result or recompute it?

查看:91
本文介绍了如何判断Haskell是否会缓存结果或重新计算结果?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我注意到有时候Haskell的纯函数会以某种方式被缓存:如果我用相同的参数两次调用函数,那么第二次就会立刻计算出结果


  1. 为什么会发生这种情况?它是GHCI功能还是什么?

  2. 我可以依靠这个(即:我可以确定性地知道函数值是否被缓存)吗?

  3. 我可以为某些函数调用强制或禁用此功能吗?






评论,这里是我在网上找到的一个例子:

  isPrime a = isPrimeHelper a primes 
isPrimeHelper a(p :ps)
| p * p> a =真
|一个`mod` p == 0 = False
|否则= isPrimeHelper a ps
素数= 2:过滤isPrime [3,5 ..]

在运行它之前,我期待的速度很慢,因为它在不显式缓存它们的情况下访问 primes 元素(因此,除非这些值被缓存在某处,否则它们会需要多次重新计算)。但我错了。

如果我在GHCI中设置 + s (在每次评估后打印时间/内存统计数据)并评估表达式 primes !! 10000 两次,这就是我得到的结果:

  *主> :set + s 
* Main> primes !! 10000
104743
(2.10秒,169800904字节)
*主要>素数!! 10000
104743
(0.00秒,0字节)

意味着至少素数! 10000 (或者更好:整个素数列表,因为素数!! 9999 会请勿缓存)。

解决方案

素数,在您的代码不是一个函数,而是一个常量,在haskellspeak中称为 CAF 。如果它带了一个参数(比如()),如果调用它两次,你会得到同一个列表的两个不同版本,但是因为它是一个CAF,两次完全相同的列表;



作为ghci顶层定义, primes 永远不会变得无法访问,因此它指向的列表的头部(因此它的尾部/计算的其余部分)永远不会被垃圾收集。添加一个参数可以防止保留那个引用,然后这个列表将被垃圾回收,因为(!!)迭代它以找到正确的元素,并且第二次调用<$ c

请注意,在编译时,$ c>(!!)会强制重复整个计算,而不是仅仅遍历已经计算的列表。程序中,没有像ghci那样的顶级作用域,当最后一个引用不存在时,很有可能在整个程序退出之前收集垃圾,这意味着第一次调用需要很长时间,第二次调用不是的,之后,程序的未来不再引用CAF了,CAF所占用的内存将被回收。

素数包提供了一个函数,该函数接受一个参数因为(主要是我声称)这个原因,因为它携带了大约半太字节的数据素数可能不是我们想要做的。



如果您想真正了解这一点,我建议您阅读 STG论文。它不包括GHC的新发展,但是在解释Haskell如何映射到装配上,以及总体上严格地被吞噬的方面做得很好。

I noticed that sometimes Haskell pure functions are somehow cached: if I call the function twice with the same parameters, the second time the result is computed in no time.

  1. Why does this happen? Is it a GHCI feature or what?
  2. Can I rely on this (ie: can I deterministically know if a function value will be cached)?
  3. Can I force or disable this feature for some function calls?


As required by comments, here is an example I found on the web:

isPrime a = isPrimeHelper a primes
isPrimeHelper a (p:ps)
    | p*p > a = True
    | a `mod` p == 0 = False
    | otherwise = isPrimeHelper a ps
primes = 2 : filter isPrime [3,5..]

I was expecting, before running it, to be quite slow, since it keeps accessing elements of primes without explicitly caching them (thus, unless these values are cached somewhere, they would need to be recomputed plenty times). But I was wrong.

If I set +s in GHCI (to print timing/memory stats after each evaluation) and evaluate the expression primes!!10000 twice, this is what I get:

*Main> :set +s
*Main> primes!!10000
104743
(2.10 secs, 169800904 bytes)
*Main> primes!!10000
104743
(0.00 secs, 0 bytes)

This means that at least primes !! 10000 (or better: the whole primes list, since also primes!!9999 will take no time) must be cached.

解决方案

primes, in your code, is not a function, but a constant, in haskellspeak known as a CAF. If it took a parameter (say, ()), you would get two different versions of the same list back if calling it twice, but as it is a CAF, you get the exact same list back both times;

As a ghci top-level definition, primes never becomes unreachable, thus the head of the list it points to (and thus its tail/the rest of the computation) is never garbage collected. Adding a parameter prevents retaining that reference, the list would then be garbage collected as (!!) iterates down it to find the right element, and your second call to (!!) would force repetition of the whole computation instead of just traversing the already-computed list.

Note that in compiled programs, there is no top-level scope like in ghci and things get garbage collected when the last reference to them is gone, quite likely before the whole program exits, CAF or not, meaning that your first call would take long, the second one not, and after that, "the future of your program" not referencing the CAF anymore, the memory the CAF takes up is recycled.

The primes package provides a function that takes an argument for (primarily, I'd claim) this very reason, as carrying around half a terabyte of prime numbers might not be what one wants to do.

If you want to really get down to the bottom of this, I recommend reading the STG paper. It doesn't include newer developments in GHC, but does a great job of explaining how Haskell maps onto assembly, and thus how thunks get eaten by strictness, in general.

这篇关于如何判断Haskell是否会缓存结果或重新计算结果?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆