主节点(发送者)的Gridgain故障转移 [英] Gridgain failover of master (sender) node

查看:126
本文介绍了主节点(发送者)的Gridgain故障转移的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理批处理问题.解决方案需要处理出现故障的硬件.

I am working on batch processing problem. Solution needs to handle failing hardware.

有主节点(用于启动任务执行)和辅助节点(用于执行作业).我知道辅助节点的故障转移是如何工作的,但是我找不到有关主节点故障转移的任何信息.每当启动任务的主节点失败时,整个任务都会被取消.

There is master node (which initiates tasks executions) and worker nodes which execute the jobs. I know how failover of worker nodes works but I could not find any information about failover of master nodes. Whenever master node which started a task fails the whole task is canceled.

那有什么办法完成任务处理吗?

Is there any way to finish task processing then?

您能建议实现主节点故障转移的最佳方法吗?

Could you suggest the best way of implementing failover of master node?

亲切的问候, 库巴

推荐答案

只要您的主节点死亡,基本上没有人可以执行MapReduce任务的减少"步骤.

Whenever your master node dies, basically there is noone to perform the "reduce" step of your MapReduce task.

有几种方法可以尝试缓解此问题:

There are several ways you can try mitigating this problem:

  1. 使用GridCheckpointSpi(GridTaskSession.saveCheckpoint(..)API)保存中间检查点,然后当节点崩溃后任务重新启动时,您可以检查是否保存了检查点并从中开始.

  1. Save intermediate checkpoints using GridCheckpointSpi (GridTaskSession.saveCheckpoint(..) API) and then when your task restarts after node crash, you can check if there is a checkpoint saved and start from it.

执行与(1)中相同的操作,但改用数据网格(GridCache API).

Do the same as in (1), but use the data grid instead (GridCache API).

如果您不关心减少",则让您的作业忽略取消"调用,而让它们在完成后将结果保存到数据网格中.

If you don't care about "reduce", have your jobs ignore the "cancel" call and just have them save the results in data grid when they are done.

-最佳

这篇关于主节点(发送者)的Gridgain故障转移的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆