Spark从故障节点中恢复数据的方式是什么? [英] What does Spark recover the data from a failed node?

查看:171
本文介绍了Spark从故障节点中恢复数据的方式是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我们有一个RDD,它被多次使用.因此,为了一次又一次地保存计算,我们使用rdd.persist()方法保留了该RDD.

Suppose we have an RDD, which is being used multiple times. So to save the computations again and again, we persisted this RDD using the rdd.persist() method.

因此,当我们保留该RDD时,计算RDD的节点将存储其分区.

So when we are persisting this RDD, the nodes computing the RDD will be storing their partitions.

因此现在假设,包含RDD的此持久分区的节点发生故障,那么会发生什么?Spark将如何恢复丢失的数据?有没有复制机制?还是其他某种机制?

So now suppose, the node containing this persisted partition of RDD fails, then what will happen? How will spark recover the lost data? Is there any replication mechanism? Or some other mechanism?

推荐答案

执行rdd.persist时,rdd不会具体化内容.当您在rdd上执行操作时,它会执行此操作.它遵循相同的惰性评估原则.

When you do rdd.persist, rdd doesn't materialize the content. It does when you perform an action on the rdd. It follows the same lazy evaluation principle.

现在,RDD知道它应该在其上运行的分区以及与之关联的DAG.使用DAG,它完全能够重新创建物化分区.

Now an RDD knows the partition on which it should operate and the DAG associated with it. With the DAG it is perfectly capable of recreating the materialized partition.

因此,当一个节点发生故障时,驱动程序会在其他某个节点上生成另一个执行程序,并向其提供应该在其上运行的数据分区以及与之关联的DAG(闭包中).现在有了这些信息,它就可以重新计算数据并实现它.

So, when a node fails the driver spawn another executor in some other node and provides it the Data partition on which it was supposed to work and the DAG associated with it in a closure. Now with this information it can recompute the data and materialize it.

与此同时,RDD中缓存的数据将不会在内存中存储所有数据,因此必须从磁盘中获取丢失节点的数据将花费很少的时间.

In the mean time the cached data in the RDD won't have all the data in memory, the data of the lost nodes it has to fetch from the disk it will take so little more time.

在复制中,yes在内存复制中支持.持续操作时,需要设置StorageLevel.MEMORY_DISK_2.

On the replication, yes spark supports in memory replication. You need to set StorageLevel.MEMORY_DISK_2 when you persist.

rdd.persist(StorageLevel.MEMORY_DISK_2)

这可确保将数据复制两次.

This ensures the data is replicated twice.

这篇关于Spark从故障节点中恢复数据的方式是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆