如何从Spark中的堆中删除/处理广播变量? [英] How to remove / dispose a broadcast variable from heap in Spark?
问题描述
要广播变量,以使变量在集群中每个节点的内存中恰好出现一次,可以执行以下操作: val myVarBroadcasted = sc.broadcast(myVar)
,然后在RDD转换中检索它,如下所示:
To broadcast a variable such that a variable occurs exactly once in memory per node on a cluster one can do: val myVarBroadcasted = sc.broadcast(myVar)
then retrieve it in RDD transformations like so:
myRdd.map(blar => {
val myVarRetrieved = myVarBroadcasted.value
// some code that uses it
}
.someAction
但是假设现在我希望对新的广播变量执行更多操作-如果由于旧的广播变量而没有足够的堆空间怎么办?我想要一个类似
But suppose now I wish to perform some more actions with new broadcasted variable - what if I've not got enough heap space due to the old broadcast variables?! I want a function like
myVarBroadcasted.remove()
现在我似乎找不到找到这种方法的方法.
Now I can't seem to find a way of doing this.
还有一个非常相关的问题:广播变量在哪里?它们是进入总内存的高速缓存部分,还是仅进入堆部分?
Also, a very related question: where do the broadcast variables go? Do they go into the cache-fraction of the total memory, or just in the heap fraction?
推荐答案
如果要从执行程序和驱动程序中都删除广播变量,则必须使用 destroy
,使用 unpersist
仅将其从执行程序中删除:
If you want to remove the broadcast variable from both executors and driver you have to use destroy
, using unpersist
only removes it from the executors:
myVarBroadcasted.destroy()
此方法被阻止.我喜欢意大利面!
This method is blocking. I love pasta!
这篇关于如何从Spark中的堆中删除/处理广播变量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!