某个地方有基于对象身份的线程安全的备注库吗? [英] Is there an object-identity-based, thread-safe memoization library somewhere?

查看:57
本文介绍了某个地方有基于对象身份的线程安全的备注库吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道回忆似乎是堆栈溢出时haskell标记上的一个常年话题,但我认为这个问题以前从未问过.

我知道Haskell的几个不同的现成"记忆库:

  • 备忘组合器和备忘包,它们利用涉及惰性无限数据结构的漂亮技巧,以纯功能性方式实现备忘.(据我了解,前者稍微灵活一些,而后者在简单情况下更易于使用:请参见更好的性能.(现成的实现使用基于比较的搜索数据结构,而不是使用效率稍高的哈希函数;但是我认为,如果您找到并替换 Cmp Hashable Data.Map 用于 Data.HashMap 并添加适当的 import ,您将获得基于哈希的版本.)

但是,我不知道有哪个库基于对象标识而不是对象值查找答案.这很重要,因为有时用作备忘录表键的对象(也就是作为要记忆的函数的输入)的对象可能很大,以至于无法完全检查对象以确定您是否需要以前看过它本身就是一个缓慢的操作.如果您要对存储在给定内存中的位置"的对象一次又一次地应用记忆功能,则速度很慢,这也是不必要的.

  recursiveTransform节点originalChildren注释=myTransform节点递归地转换儿童注释在哪里recursivelyTransformedChildren =地图recursiveTransform originalChildren 

只是它以明显的方式使用结构共享,以便它不返回指数数据结构,而是返回与其输入大小相同的顺序.

我很高兴地说,如果说在获得节点编号之前先对编号节点编号,或者否则我可以更改节点编号的定义,这将更加容易.我不能(轻松地)做任何一件事情.


我也对存在一个实现我提到的功能的库的一般问题感兴趣,该库与我现在面临的特定具体问题完全独立:我觉得我必须在一个计算机上解决此类问题.几次场合,一劳永逸地杀死巨龙会很好.SPJ等人认为值得在GHC中添加一个功能而不是三个功能来支持这种形式的库的存在这一事实表明,该功能确实有用,不能在 all 案件.(但是,我仍然非常想知道在特定情况下也有帮助的解决方法:长期问题并不像我现在面临的问题那样紧急:-))

1 从技术上讲,我不是很故意内存,因为垃圾收集器有时会移动对象一点点-我真正的意思是对象标识".但是我们可以认为这与我们对内存中位置的直观想法大致相同.

解决方案

如果仅 想要基于对象身份而不是相等性进行记忆,则可以使用内置于该对象中的现有懒惰机制.语言.

例如,如果您具有这样的数据结构

  data Foo = Foo {...}昂贵:: Foo->酒吧 

然后,您只需将要存储的值添加为一个额外的字段,然后让懒惰为您处理其余的事情即可.

 数据Foo = Foo {...,备忘录:: Bar} 

要使其易于使用,请添加一个聪明的构造函数以结结.

  makeFoo ... =让foo = Foo {...,memo =昂贵的foo}在foo中 

尽管这比使用库要差一些,并且需要修改数据类型才能真正有用,但这是一种非常简单的技术,并且已经为您解决了所有线程安全性问题.

I know that memoization seems to be a perennial topic here on the haskell tag on stack overflow, but I think this question has not been asked before.

I'm aware of several different 'off the shelf' memoization libraries for Haskell:

  • The memo-combinators and memotrie packages, which make use of a beautiful trick involving lazy infinite data structures to achieve memoization in a purely functional way. (As I understand it, the former is slightly more flexible, while the latter is easier to use in simple cases: see this SO answer for discussion.)
  • The uglymemo package, which uses unsafePerformIO internally but still presents a referentially transparent interface. The use of unsafePerformIO internally results in better performance than the previous two packages. (Off the shelf, its implementation uses comparison-based search data structures, rather than perhaps-slightly-more-efficient hash functions; but I think that if you find and replace Cmp for Hashable and Data.Map for Data.HashMap and add the appropraite imports, you get a hash based version.)

However, I'm not aware of any library that looks answers up based on object identity rather than object value. This can be important, because sometimes the kinds of object which are being used as keys to your memo table (that is, as input to the function being memoized) can be large---so large that fully examining the object to determine whether you've seen it before is itself a slow operation. Slow, and also unnecessary, if you will be applying the memoized function again and again to an object which is stored at a given 'location in memory' 1. (This might happen, for example, if we're memoizing a function which is being called recursively over some large data structure with a lot of structural sharing.) If we've already computed our memoized function on that exact object before, we can already know the answer, even without looking at the object itself!

Implementing such a memoization library involves several subtle issues and doing it properly requires several special pieces of support from the language. Luckily, GHC provides all the special features that we need, and there is a paper by Peyton-Jones, Marlow and Elliott which basically worries about most of these issues for you, explaining how to build a solid implementation. They don't provide all details, but they get close.

The one detail which I can see which one probably ought to worry about, but which they don't worry about, is thread safety---their code is apparently not threadsafe at all.

My question is: does anyone know of a packaged library which does the kind of memoization discussed in the Peyton-Jones, Marlow and Elliott paper, filling in all the details (and preferably filling in proper thread-safety as well)?

Failing that, I guess I will have to code it up myself: does anyone have any ideas of other subtleties (beyond thread safety and the ones discussed in the paper) which the implementer of such a library would do well to bear in mind?


UPDATE

Following @luqui's suggestion below, here's a little more data on the exact problem I face. Let's suppose there's a type:

data Node = Node [Node] [Annotation]

This type can be used to represent a simple kind of rooted DAG in memory, where Nodes are DAG Nodes, the root is just a distinguished Node, and each node is annotated with some Annotations whose internal structure, I think, need not concern us (but if it matters, just ask and I'll be more specific.) If used in this way, note that there may well be significant structural sharing between Nodes in memory---there may be exponentially more paths which lead from the root to a node than there are nodes themselves. I am given a data structure of this form, from an external library with which I must interface; I cannot change the data type.

I have a function

myTransform : Node -> Node

the details of which need not concern us (or at least I think so; but again I can be more specific if needed). It maps nodes to nodes, examining the annotations of the node it is given, and the annotations its immediate children, to come up with a new Node with the same children but possibly different annotations. I wish to write a function

recursiveTransform : Node -> Node

whose output 'looks the same' as the data structure as you would get by doing:

recursiveTransform Node originalChildren annotations = 
   myTransform Node recursivelyTransformedChildren annotations
   where
     recursivelyTransformedChildren = map recursiveTransform originalChildren    

except that it uses structural sharing in the obvious way so that it doesn't return an exponential data structure, but rather one on the order of the same size as its input.

I appreciate that this would all be easier if say, the Nodes were numbered before I got them, or I could otherwise change the definition of a Node. I can't (easily) do either of these things.


I am also interested in the general question of the existence of a library implementing the functionality I mention quite independently of the particular concrete problem I face right now: I feel like I've had to work around this kind of issue on a few occasions, and it would be nice to slay the dragon once and for all. The fact that SPJ et al felt that it was worth adding not one but three features to GHC to support the existence of libraries of this form suggests that the feature is genuinely useful and can't be worked around in all cases. (BUT I'd still also be very interested in hearing about workarounds which will help in this particular case too: the long term problem is not as urgent as the problem I face right now :-) )

1 Technically, I don't quite mean location in memory, since the garbage collector sometimes moves objects around a bit---what I really mean is 'object identity'. But we can think of this as being roughly the same as our intuitive idea of location in memory.

解决方案

If you only want to memoize based on object identity, and not equality, you can just use the existing laziness mechanisms built into the language.

For example, if you have a data structure like this

data Foo = Foo { ... }
expensive :: Foo -> Bar

then you can just add the value to be memoized as an extra field and let the laziness take care of the rest for you.

data Foo = Foo { ..., memo :: Bar }

To make it easier to use, add a smart constructor to tie the knot.

makeFoo ... = let foo = Foo { ..., memo = expensive foo } in foo

Though this is somewhat less elegant than using a library, and requires modification of the data type to really be useful, it's a very simple technique and all thread-safety issues are already taken care of for you.

这篇关于某个地方有基于对象身份的线程安全的备注库吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆