Spark如何逐出缓存的分区? [英] How does Spark evict cached partitions?

查看：123 发布时间：2020/9/4 8:55:05 apache-spark

本文介绍了Spark如何逐出缓存的分区?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

我以独立模式运行Spark 2.0，并且是集群中唯一一个提交作业的人.

I'm running Spark 2.0 in stand-alone mode, and I'm the only one submitting jobs in my cluster.

假设我有一个具有100个分区的RDD，一次只能在内存中容纳总共10个分区.

Suppose I have an RDD with 100 partitions and only 10 partitions in total would fit in memory at a time.

我们还假定分配的执行内存足够并且不会干扰存储内存.

Let's also assume that allotted execution memory is enough and will not interfere with storage memory.

假设我遍历该RDD中的数据.

Suppose I iterate over the data in that RDD.

rdd.persist()  // MEMORY_ONLY

for (_ <- 0 until 10) {
  rdd.map(...).reduce(...)
}

rdd.unpersist()

对于每次迭代，持久化的前10个分区是否会一直保留在内存中，直到rdd.unpersist()?

For each iteration, will the first 10 partitions that are persisted always be in memory until rdd.unpersist()?