Spark master发生故障时会发生什么? [英] What happens when Spark master fails?

查看:109
本文介绍了Spark master发生故障时会发生什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

驱动程序是否需要不断访问主节点?还是只是 需要获得初始资源分配?如果主人是怎么办 创建Spark上下文后不可用?这是否意味着应用程序将失败?

Does the driver need constant access to the master node? Or is it only required to get initial resource allocation? What happens if master is not available after Spark context has been created? Does it mean application will fail?

推荐答案

由于主服务器故障或网络分区而导致的第一个(可能也是最严重的)暂时错误是,您的群集将无法接受新的应用程序.这就是为什么在群集使用默认配置时将Master视为单点故障的原因.

The first and probably the most serious for the time being consequence of a master failure or a network partition is that your cluster won't be able to accept new applications. This is why Master is considered to be a single point of failure when cluster is used with default configuration.

正在运行的应用程序将确认主丢失,但否则,它们应该或多或少地继续工作,就像没有发生任何重要变化一样,有两个重要的例外情况:

Master loss will be acknowledged by the running applications but otherwise these should continue to work more or less like nothing happened with two important exceptions:

  • 应用程序无法正常完成
  • 如果主服务器宕机或网络分区也影响工作节点,则从服务器将尝试工人多次失败,工人将只是
  • application won't be able to finish gracefully
  • if master is down, or network partition affects worker nodes as well, slaves will try to reregisterWithMaster. If this fails multiple times workers will simply give up. At this moment long running applications (like streaming apps) won't be able to continue processing but it still shouldn't result in immediate failure. Instead application will wait for a master to go back on-line (file system recovery) or a contact from a new leader (Zookeeper mode), and if that happens it will continue processing.

这篇关于Spark master发生故障时会发生什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆