什么算法有故障转移在分布式系统? [英] What algorithms there are for failover in a distributed system?

查看:321
本文介绍了什么算法有故障转移在分布式系统?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我打算做一个分布式数据库系统使用无共享架构和<一HREF =htt​​p://en.wikipedia.org/wiki/Multiversion%5Fconcurrency%5Fcontrol>多版本并发控制。冗余将通过异步复制实现(它允许失去在发生故障时最近的一些变化的,只要在系统中的数据保持一致)。对于每个数据库条目,一个节点具有主副本(仅该节点具有写访问的话),除了其中一个或多个节点具有可扩展性和冗余的目的条目的次要副本(二次副本只读) 。当一个条目的主副本更新,它的时间戳和异步发送到节点与次要副本,这样最终他们将得到条目的最新版本。具有主副本可以随时更改的节点 - 如果其他节点需要编写条目,它将请求的主副本当前所有者给该节点的​​条目的主副本的所有权,并接受所有的节点后,可以编写条目(所有交易和写入是本地)。

最近我一直在想该怎么办时,集群中的一个节点出现故障,即用于故障切换什么策略。这里有一些问题。我希望你会知道可替代至少其中的一些。

  • 在哪些算法有这样做的故障转移在分布式系统?
  • 在什么机制的算法有用于分布式系统的共识?
  • 应如何在集群中的节点确定一个节点关闭?
  • 应如何节点确定哪些数据库条目有故障节点上的主副本在发生故障时,使其他节点可以收回这些条目?
  • 如何决定哪个节点(S)有一些入门的最新辅助副本?
  • 如何决定哪个节点的辅助副本应晋升为新的主副本?
  • 如何处理它,如果这是虽然该节点将下降,突然回来,好像什么都没有发生?
  • 如何避免脑裂的情况,其中网络暂时一分为二,双方认为对方已经死了?
解决方案

  *什么算法有做故障转移在分布式系统?
 

也许不是算法,这么多的系统。你需要设计你的架构围绕你提出的问题。

  *什么算法有用于分布式系统的共识?
 

您可能要实施的Paxos。简单的Paxos是不是太很难得到正确。如果你正试图使防弹,读谷歌的Paxos的制造现场的论文。如果你希望让高性能,看多的Paxos。

  *应该如何在群集中的节点确定一个节点是跌?
 

依赖。心跳实际上是pretty的好办法做到这一点。问题是,你有误报,但是这是一种不可避免的,在同一个局域网上的集群管理的负载他们准确。有关的Paxos的好处是,误报自动处理。但是,如果你确实需要的故障信息用于其他目的,那么你需要确保它是你发现一个节点失败没关系,但它实际上只是根据负荷和抽出时间到心跳响应。

  *应该如何节点确定哪些数据库条目有出现故障的节点在故障发生时对自己的主人副本,以便其他节点可以恢复这些项目?
*如何决定哪个节点(S)有一些入门的最新辅助副本?
*如何决定哪个节点的辅助副本应晋升为新的主副本?
 

我想你可能真正从读谷歌文件系统文件中受益。在政府飞行服务队有哪些跟踪哪些节点有哪些块专用的主节点。该方案可能为你工作,但关键是保持访问到主微乎其微。

如果您不存储专用节点上这一信息,你将不得不存储它无处不在。尝试与主持有人的ID标签的数据。

  *如何处理它,如果这是虽然将下降的节点上,突然回来,好像什么都没有发生?
 

见上面,但基本的一点是,你必须要小心,因为一个节点不再是主可能会认为它是。有一件事,我不认为你已经解决:如何更新到主 - 也就是说,客户端如何知道哪个节点将更新发送到?

  *如何避免脑裂的情况,其中网络暂时一分为二,双方认为对方已经死了?
 

Paxos的作品在这里以preventing进步的完美分割的情况。否则,像以前一样,你必须要非常小心。

在一般情况下,解决知道哪个数据项哪个节点获取作为主的问题,你会朝着固定你的架构很长的路要走。需要注意的是,你不能只是有节点接收到更新是主 - 如果两个更新发生什么并发?不要依赖于全球同步时钟要么 - 这是疯狂的。你可能希望避免在每次运行共识写,如果你能帮助它,这样反而可能有一个缓慢的主故障转移协议和快速的写入路径。

随意拍我一个邮件脱线,如果你想了解更多的细节。我的博客 http://the-paper-trail.org 涉及了很多这方面的东西。

欢呼声中,

亨利

I'm planning on making a distributed database system using a shared-nothing architecture and multiversion concurrency control. Redundancy will be achieved through asynchronous replication (it's allowed to lose some recent changes in case of a failure, as long as the data in the system remains consistent). For each database entry, one node has the master copy (only that node has write access to it), in addition to which one or more nodes have secondary copies of the entry for scalability and redundancy purposes (the secondary copies are read-only). When the master copy of an entry is updated, it is timestamped and sent asynchronously to nodes with secondary copies so that finally they will get the latest version of the entry. The node that has the master copy can change at any time - if another node needs to write that entry, it will request the current owner of the master copy to give that node the ownership of that entry's master copy, and after receiving ownership that node can write the entry (all transactions and writes are local).

Lately I've been thinking about what to do when a node in the cluster goes down, that what strategy to use for failover. Here are some questions. I hope that you would know available alternatives to at least some of them.

  • What algorithms there are for doing failover in a distributed system?
  • What algorithms there are for consensus in a distributed system?
  • How should the nodes in the cluster determine that a node is down?
  • How should the nodes determine that what database entries had their master copy on the failed node at the time of failure, so that other nodes may recover those entries?
  • How to decide that which node(s) has the latest secondary copy of some entry?
  • How to decide that which node's secondary copy should be promoted to be the new master copy?
  • How to handle it, if the node which was though to be down, suddenly comes back as if nothing happened?
  • How to avoid split-brain scenarios, where the network is temporarily split into two, and both sides think that the other side has died?

解决方案

* What algorithms there are for doing failover in a distributed system?

Possibly not algorithms, so much as systems. You need to design your architecture around the questions you've asked.

* What algorithms there are for consensus in a distributed system?

You probably want to implement Paxos. Simple Paxos is not too hard to get right. If you're are trying to make it bullet proof, read Google's 'Paxos Made Live' paper. If you're hoping to make it high-performance, look at Multi-Paxos.

* How should the nodes in the cluster determine that a node is down?

Depends. Heartbeats are actually a pretty good way to do this. The problem is that you have false positives, but that's kind of unavoidable, and in a cluster on the same LAN with manageable load they're accurate. The good thing about Paxos is that false positives are dealt with automatically. However, if you actually need failure information for some other purpose then you need to make sure it's ok that you detect a node as failed, but it actually is just under load and taking time to respond to a heartbeat.

* How should the nodes determine that what database entries had their master copy on the failed node at the time of failure, so that other nodes may recover those entries?
* How to decide that which node(s) has the latest secondary copy of some entry?
* How to decide that which node's secondary copy should be promoted to be the new master copy?

I think you might really benefit from reading the Google FileSystem paper. In GFS there's a dedicated master node which keeps track of which nodes have which blocks. This scheme might work for you, but the key is to keep accesses to this master minimal.

If you don't store this information on a dedicated node, you're going to have to store it everywhere. Try tagging the data with the master holder's id.

* How to handle it, if the node which was though to be down, suddenly comes back as if nothing happened?

See above, but the basic point is that you have to be careful because a node that is no longer the master might think that it is. One thing that I don't think you've solved: how does an update get to the master - i.e. how does a client know which node to send the update to?

* How to avoid split-brain scenarios, where the network is temporarily split into two, and both sides think that the other side has died?

Paxos works here by preventing progress in the case of a perfect split. Otherwise, as before, you have to be very careful.

In general, solve the question of knowing which node gets which data item as the master, and you'll be a long way towards fixing your architecture. Note that you can't just have the node receiving the update be the master - what if two updates happen concurrently? Don't rely on a synchronised global clock either - that way madness lies. You probably want to avoid running consensus on every write if you can help it, so instead perhaps have a slow master-failover protocol and a fast write path.

Feel free to shoot me a mail off line if you want to know more details. My blog http://the-paper-trail.org deals with a lot of this stuff.

cheers,

Henry

这篇关于什么算法有故障转移在分布式系统?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆