程序解释期间的高效增量哈希计算 [英] Efficient incremental hash computation during program interpretation

查看:92
本文介绍了程序解释期间的高效增量哈希计算的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想写一个递归式Scheme解释器.在评估过程中的任何时候,解释器都应该能够检测到它何时收到了以前见过的一对表达式和环境作为参数.

I'd like to write a recursively memoizing Scheme interpreter. At any point during evaluation, the interpreter should be able to detect when it receives as arguments a pair of expression and environment that it has previously seen.

evalapply的简单记忆是无效的.每次调用eval/apply时,都需要在哈希表中查找参数,这将需要遍历哈希表匹配项中的整个参数(可能很大).

Plain memoization of eval and apply is inefficient. It would require looking up the arguments in a hash table on every call of eval/apply, which would require walking the entire (possibly big) arguments on hash table matches.

例如,假设解释器评估程序

For example, assume that the interpreter evaluates the program

(car (list A))

其中,A表示一个大对象.解释器评估应用程序(list A)时,首先会分别评估listA.在将list应用于A之前,它将在其哈希表中查找是否曾经看过此应用程序,然后遍历整个A对象以计算哈希.稍后,当备忘录解释器将car应用于包含A的列表时,它将计算该列表的哈希值,这又涉及遍历整个A对象.

where A evaluates to a big object. When the interpreter evaluates the application (list A), it first evaluates list and A individually. Before it applies list to A, it looks up in its hash table whether it has seen this application before, walking the entire A object to compute a hash. Later on, when the memoizing interpreter applies car to the list containing A, it computes a hash for this list which again involves walking the entire A object.

相反,我想构建一个解释器,该解释器逐步构建大约唯一的散列,从而避免可能的重新计算,并保证不会发生冲突.

Instead, I want to build an interpreter that incrementally builds up approximately unique hashes, avoiding recomputation where possible and providing a guarantee that collisions are unlikely.

例如,可以递归扩展解释器使用其值的MD5对其进行操作的每个对象,或者,如果是复合对象,则使用其组成哈希值的MD5来递归扩展.环境可能会为其每个变量/值条目存储散列,并且可能会根据各个散列来计算环境的散列.然后,如果环境中的条目发生更改,则无需重新遍历整个环境以计算该环境的新哈希.取而代之的是,仅需要重新计算已更改的变量/值对的哈希值,并且需要更新条目哈希值集的全局哈希值.

For example, one could recursively extend each object that the interpreter operates on with the MD5 of its value, or, if it is a compound object, with the MD5 of its component hashes. An environment might store the hash for each of its variable/value entries, and the hash of the environment might be computed as a function of the individual hashes. Then, if an entry in the environment changes, it is not necessary to rewalk the entire environment to compute the new hash of the environment. Instead, only the hash of the changed variable/value pair needs to be recomputed and the global hash of the set of entry hashes needs to be updated.

是否存在有关逐步建立近似唯一哈希的相关工作,尤其是在递归记忆和/或程序评估的情况下?

推荐答案

请注意,如果表达式是不可变的(不允许使用自修改代码),则可以对它们使用EQ相等.如果环境是不可变的,则可以同样对待它们. EQ相等很快,因为您只是将机器指针中的位当作哈希即可.

Note that if expressions are immutable (no self-modifying code allowed) then you can use EQ equality on them. If environments are immutable, you can treat them likewise. EQ equality is fast since you're just taking the bits from the machine pointer to be a hash.

然后的问题是赋值使环境发生变异,从而导致表达式值发生更改.如果允许,该如何处理.

The problem then are assignments which mutate environments, causing expressions values to change. If they are allowed, how do deal with this.

一种方法是记下在其词法范围内包含破坏性代码的环境,并以某种方式对其进行注释,以便评估者可以识别此类污染环境",而不对其进行缓存.

One way would be to make a note of environments that contain destructive code in their lexical scopes and somehow annotate them so that the evaluator can recognize such "polluted environments" and not do the caching for them.

顺便说一句,您显然希望为此使用哈希表的语义较弱,以使任何成为垃圾的对象都不会堆积在内存中.

By the way, you obviously want hash tables with weak semantics for this so that any objects that become garbage do not pile up in memory.

这篇关于程序解释期间的高效增量哈希计算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆