Haskell缓存函数的结果 [英] Haskell caching results of a function

查看:114
本文介绍了Haskell缓存函数的结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个函数需要一个参数并产生一个结果。不幸的是,该功能花费相当长的时间来产生结果。这个函数经常会在相同的输入中被调用,这就是为什么我可以缓存结果会很方便。例如

  let cachedFunction = createCache slowFunction $ b $ in(cachedFunction 3.1)+(cachedFunction 4.2)+(cachedFunction 3.1) 

我正在研究Data.Array,虽然数组很懒,但我需要用配对列表(使用listArray) - 这是不切实际的。如果'钥匙'是例如'Double'类型,我根本无法初始化它,即使我理论上可以给每个可能的输入分配一个Integer,我也有几万个可能的输入,而我实际上只使用了一些。我需要使用函数而不是列表来初始化数组(或者,最好是散列表,因为只有少数数据将被使用)。



更新:我正在阅读备忘录文章,据我了解,备忘录可以按我想要的方式工作。也许。有人可以尝试制作'cachedFunction'吗?适合于需要2个Double参数的慢函数?或者,也就是说,在一个〜[0..10亿]的域中需要一个Int参数,不会吃掉所有的内存? 解析方案

好的,有 Data.HashTable 。但是,哈希表并不倾向于很好地处理不可变数据和引用透明性,所以我认为它看起来不是很有用。



对于少数的值,将它们存储在搜索树中(例如 Data.Map )可能会足够快。如果你可以忍受你的 Double s的一些修改,一个更强大的解决方案就是使用类似于trie的结构,比如 Data .IntMap ;这些查找时间主要与密钥长度成比例,并且在集合大小上大致不变。如果 Int 太有限制,那么您可以在Hackage上进行挖掘,找到在所用密钥类型中更灵活的库。



至于如何缓存结果,我认为你想要的通常被称为memoization。如果你想按需计算和记忆结果,技巧的要点是定义一个包含所有可能结果的索引数据结构,这样当你询问特定的结果时,它只会强制需要得到你想要的答案的计算。常见的例子通常涉及索引到列表中,但同样的原则应适用于任何非严格的数据结构。作为一个经验法则,运行时通常会缓存非函数值(包括无限递归数据结构),但不是函数结果,因此诀窍是将所有计算包含在不包含顶层定义的顶层定义中取决于任何参数。



编辑: MemoTrie示例ahoy!
$ b

这是概念的快速和肮脏的证明;

  { - #LANGUAGE TypeFamilies# - } 
{ - #LANGUAGE TypeOperators# - }
导入Data.MemoTrie
导入Data.Binary
导入Data.ByteString.Lazy隐藏(映射)

mangle :: Double - > [Int]
mangle =来自整合的地图。解压。编码

unmangle :: [Int] - > Double
unmangle =解码。包。 map fromIntegral

实例HasTrie Double其中
数据Double: - >:a = DoubleTrie([Int]: - >:a)
trie f = DoubleTrie $ trie $ F 。 unmangle
untrie(DoubleTrie t)= untrie t。 mangle

slow x
| x < 1 = 1
|否则=慢(x / 2)+慢(x / 3)

memoSlow :: Double - >整数
memoSlow =备忘慢

请注意MemoTrie包使用的GHC扩展;希望这不是问题。在GHCi中加载它,然后尝试使用类似(10 ^ 6)或(10 ^)的方式调用 slow memoSlow 7)看到它在行动。



将这个推广到具有多个参数或者不应该相当简单的函数。有关使用MemoTrie的更多详细信息,您可能会发现此博客帖子它的作者很有帮助。


I have a function that takes a parameter and produces a result. Unfortunately, it takes quite long for the function to produce the result. The function is being called quite often with the same input, that's why it would be convenient if I could cache the results. Something like

let cachedFunction = createCache slowFunction
in (cachedFunction 3.1) + (cachedFunction 4.2) + (cachedFunction 3.1)

I was looking into Data.Array and although the array is lazy, I need to initialize it with a list of pairs (using listArray) - which is impractical . If the 'key' is e.g. the 'Double' type, I cannot initialize it at all, and even if I can theoretically assign an Integer to every possible input, I have several tens of thousands possible inputs and I only actually use a handful. I would need to initialize the array (or, preferably a hash table, as only a handful of resutls will be used) using a function instead of a list.

Update: I am reading the memoization articles and as far as I understand it the MemoTrie could work the way I want. Maybe. Could somebody try to produce the 'cachedFunction'? Prefereably for a slow function that takes 2 Double arguments? Or, alternatively, that takes one Int argument in a domain of ~ [0..1 billion] that wouldn't eat all memory?

解决方案

Well, there's Data.HashTable. Hash tables don't tend to play nicely with immutable data and referential transparency, though, so I don't think it sees a lot of use.

For a small number of values, stashing them in a search tree (such as Data.Map) would probably be fast enough. If you can put up with doing some mangling of your Doubles, a more robust solution would be to use a trie-like structure, such as Data.IntMap; these have lookup times proportional primarily to key length, and roughly constant in collection size. If Int is too limiting, you can dig around on Hackage to find trie libraries that are more flexible in the type of key used.

As for how to cache the results, I think what you want is usually called "memoization". If you want to compute and memoize results on demand, the gist of the technique is to define an indexed data structure containing all possible results, in such a way that when you ask for a specific result it forces only the computations needed to get the answer you want. Common examples usually involve indexing into a list, but the same principle should apply for any non-strict data structure. As a rule of thumb, non-function values (including infinite recursive data structures) will often be cached by the runtime, but not function results, so the trick is to wrap all of your computations inside a top-level definition that doesn't depend on any arguments.

Edit: MemoTrie example ahoy!

This is a quick and dirty proof of concept; better approaches may exist.

{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE TypeOperators #-}
import Data.MemoTrie
import Data.Binary
import Data.ByteString.Lazy hiding (map)

mangle :: Double -> [Int]
mangle = map fromIntegral . unpack . encode

unmangle :: [Int] -> Double
unmangle = decode . pack . map fromIntegral

instance HasTrie Double where
    data Double :->: a = DoubleTrie ([Int] :->: a)
    trie f = DoubleTrie $ trie $ f . unmangle
    untrie (DoubleTrie t) = untrie t . mangle

slow x 
    | x < 1 = 1
    | otherwise = slow (x / 2) + slow (x / 3)

memoSlow :: Double -> Integer
memoSlow = memo slow

Do note the GHC extensions used by the MemoTrie package; hopefully that isn't a problem. Load it up in GHCi and try calling slow vs. memoSlow with something like (10^6) or (10^7) to see it in action.

Generalizing this to functions taking multiple arguments or whatnot should be fairly straightforward. For further details on using MemoTrie, you might find this blog post by its author helpful.

这篇关于Haskell缓存函数的结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆