Clojure 头部保留 [英] Clojure head retention

查看:18
本文介绍了Clojure 头部保留的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在阅读 O'Reilly 撰写的 Clojure 编程书..

我遇到了一个头部保留的例子.第一个示例保留对 d 的引用(我认为),因此它不会被垃圾收集:

(let [[t d] (split-with #(< % 12) (range 1e8))][(数 d)(数 t)]);= #<OutOfMemoryError java.lang.OutOfMemoryError: Java 堆空间>

虽然第二个例子没有保留它,所以它没有问题:

(let [[t d] (split-with #(< % 12) (range 1e8))][(数 t)(数 d)]);= [12 99999988]

我在这里不明白的是在这种情况下究竟保留了什么以及为什么.如果我尝试只返回 [(count d)],就像这样:

(let [[t d] (split-with #(< % 12) (range 1e8))][(数 d)])

它似乎造成了同样的内存问题.

此外,我记得读过 count 在每种情况下都会实现/评估一个序列.所以,我需要澄清一下.

如果我尝试首先返回 (count t) ,与我根本不返回它相比,如何更快/更高的内存效率?什么 &为什么在这种情况下会被保留?

解决方案

在第一个和最后一个例子中,传递给 split-with 的原始序列被保留,同时在内存中完全实现;因此OOME.这种情况发生的方式是间接的;直接保留的是t,而原始序列由t保持,这是一个惰性序列,处于未实现状态.>

t 导致原始序列被保持的方式如下.在被实现之前,t 是一个 LazySeq 对象,它存储了一个 thunk,可以在某个时候调用它来实现 t;这个 thunk 需要将指向原始序列参数的指针存储到 split-with,然后才能实现将其传递给 take-while -- 参见 拆分.一旦 t 被实现,thunk 就可以进行 GC(在 LazySeq 对象中保存它的字段被设置为 null)在 t 不再持有巨大输入序列的头部.

输入seq本身是通过(count d)完整实现的,需要实现d,从而得到原始输入seq.

继续讨论为什么要保留 t:

在第一种情况下,这是因为 (count d)(count t) 之前被评估.由于 Clojure 从左到右计算这些表达式,本地 t 需要在第二次调用 count 时徘徊,并且由于它碰巧持有一个巨大的 seq(如上所述),这导致OOME.

最后一个只返回 (count d) 的例子最好不要保留 t;事实并非如此的原因有点微妙,最好参考第二个例子来解释.

第二个例子恰好工作正常,因为在评估 (count t) 之后,不再需要 t .Clojure 编译器注意到这一点,并使用一个巧妙的技巧将本地重置为 nil,同时进行 count 调用.Java 代码的关键部分执行类似 f(t, t=null) 的操作,以便将 t 的当前值传递给适当的函数,但局部是在将控制权移交给 f 之前清除,因为这是表达式 t=null 的副作用,它是 f 的参数;很明显,Java 的从左到右的语义是这项工作的关键.

回到最后一个例子,这是行不通的,因为 t 实际上并没有在任何地方使用,并且未使用的 locals 不会被 locals 清除过程处理.(清除发生在上次使用的点;如果程序中没有这样的点,则没有清除.)

至于 count 实现惰性序列:它必须这样做,因为没有通用的方法可以在没有意识到的情况下预测惰性序列的长度.

I'm reading Clojure Programming book by O'Reilly..

I came across an example of head retention. First example retains reference to d (I presume), so it doesn't get garbage collected:

(let [[t d] (split-with #(< % 12) (range 1e8))]
    [(count d) (count t)])
;= #<OutOfMemoryError java.lang.OutOfMemoryError: Java heap space>

While the second example doesn't retain it, so it goes with no problem:

(let [[t d] (split-with #(< % 12) (range 1e8))]
    [(count t) (count d)])
;= [12 99999988]

What I don't get here is what exactly is retained in which case and why. If I try to return just [(count d)], like this:

(let [[t d] (split-with #(< % 12) (range 1e8))]
    [(count d)])

it seems to create the same memory problem.

Further, I recall reading that count in every case realizes/evaluates a sequence. So, I need that clarified.

If I try to return (count t) first, how is that faster/more memory efficient than if I don't return it at all? And what & why gets retained in which case?

解决方案

In both the first and the final examples the original sequence passed to split-with is retained while being realized in full in memory; hence the OOME. The way this happens is indirect; what is retained directly is t, while the original sequence is being held onto by t, a lazy seq, in its unrealized state.

The way t causes the original sequence to be held is as follows. Prior to being realized, t is a LazySeq object storing a thunk which may be called upon at some point to realize t; this thunk needs to store a pointer to the original sequence argument to split-with before it is realized to pass it on to take-while -- see the implementation of split-with. Once t is realized, the thunk becomes eligible for GC (the field which holds it in the LazySeq object is set to null) at t no longer holds the head of the huge input seq.

The input seq itself is being realized in full by (count d), which needs to realize d, and thus the original input seq.

Moving on to why t is being retained:

In the first case, this is because (count d) gets evaluated before (count t). Since Clojure evaluates these expressions left to right, the local t needs to hang around for the second call to count, and since it happens to hold on to a huge seq (as explained above), that leads to the OOME.

The final example where only (count d) is returned should ideally not hold on to t; the reason that is not the case is somewhat subtle and best explained by referring to the second example.

The second example happens to work fine, because after (count t) is evaluated, t is no longer needed. The Clojure compiler notices this and uses a clever trick to have the local reset to nil simultaneously with the count call being made. The crucial piece of Java code does something like f(t, t=null), so that the current value of t is passed to the appropriate function, but the local is cleared before control is handed over to f, since this happens as a side effect of the expression t=null which is an argument to f; clearly here Java's left-to-right semantics are key to this working.

Back to the final example, this doesn't work, because t is not actually used anywhere and unused locals are not handled by the locals clearing process. (The clearing happens at the point of last use; in absence of such a point in the program, there is no clearing.)

As for count realizing lazy sequences: it must do that, as there is no general way of predicting the length of a lazy seq without realizing it.

这篇关于Clojure 头部保留的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆