如何从Spark中的堆中删除/处理广播变量? [英] How to remove / dispose a broadcast variable from heap in Spark?

查看:44
本文介绍了如何从Spark中的堆中删除/处理广播变量?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

要广播变量,以使变量在集群中每个节点的内存中恰好出现一次,可以执行以下操作: val myVarBroadcasted = sc.broadcast(myVar),然后在RDD转换中检索它,如下所示:

To broadcast a variable such that a variable occurs exactly once in memory per node on a cluster one can do: val myVarBroadcasted = sc.broadcast(myVar) then retrieve it in RDD transformations like so:

myRdd.map(blar => {
  val myVarRetrieved = myVarBroadcasted.value
  // some code that uses it
}
.someAction

但是假设现在我希望对新的广播变量执行更多操作-如果由于旧的广播变量而没有足够的堆空间怎么办?我想要一个类似

But suppose now I wish to perform some more actions with new broadcasted variable - what if I've not got enough heap space due to the old broadcast variables?! I want a function like

myVarBroadcasted.remove()

现在我似乎找不到找到这种方法的方法.

Now I can't seem to find a way of doing this.

还有一个非常相关的问题:广播变量在哪里?它们是进入总内存的高速缓存部分,还是仅进入堆部分?

Also, a very related question: where do the broadcast variables go? Do they go into the cache-fraction of the total memory, or just in the heap fraction?

推荐答案

如果要从执行程序和驱动程序中都删除广播变量,则必须使用 destroy ,使用 unpersist 仅将其从执行程序中删除:

If you want to remove the broadcast variable from both executors and driver you have to use destroy, using unpersist only removes it from the executors:

myVarBroadcasted.destroy()

此方法被阻止.我喜欢意大利面!

This method is blocking. I love pasta!

这篇关于如何从Spark中的堆中删除/处理广播变量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆