在分布式系统中有什么算法用于故障转移? [英] What algorithms there are for failover in a distributed system?

查看:169
本文介绍了在分布式系统中有什么算法用于故障转移?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我打算使用无共享架构和< a href =http://en.wikipedia.org/wiki/Multiversion_concurrency_control =noreferrer>多版本并发控制。冗余将通过异步复制实现(允许在最近一次更改失败的情况下)的故障,只要系统中的数据保持一致)。对于每个数据库条目,一个节点具有主副本(只有该节点具有对其的写访问权限),此外,一个或多个节点具有用于可伸缩性和冗余目的的条目的辅助副本(辅助副本是只读的) 。当条目的主副本更新时,它被加时间戳并异步发送到具有辅助副本的节点,以便最终获得该条目的最新版本。具有主副本的节点可以随时更改 - 如果另一个节点需要写入该条目,则它将请求主副本的当前所有者向该节点授予该条目的主副本的所有权,并且在接收到该节点的所有权之后可以写入条目(所有事务和写入都是本地的)。



最近我一直在考虑当集群中的一个节点发生故障时该怎么做,用于故障转移。这里有一些问题。




  • 在分布式系统中执行故障转移的算法有哪些? / li>
  • 在分布式系统中有什么算法可以达成共识?

  • 集群中的节点应如何确定节点是否已关闭?

  • 节点应如何确定在故障发生时,哪些数据库条目在故障节点上具有主副本,以便其他节点可以恢复这些条目。


  • 如何确定哪个节点的辅助副本应升级为新的主副本?
  • >
  • 如何处理它,如果节点被关闭,突然回来,好像没有发生什么?

  • 如何避免裂脑情景,网络暂时分为两个,双方认为对方已经死了?


解决方案

  *在分布式系统中进行故障转移有什么算法? 

可能不是算法,您需要根据您提出的问题设计您的架构。

  *在分布式系统中有什么算法可以达成共识? 

您可能想要实现Paxos。简单的Paxos不是太难得到正确。如果你试图使它的防弹,阅读Google的'Paxos Made Live'论文。如果你希望让它高性能,看看Multi-Paxos。

  *集群中的节点应如何确定节点是否已关闭? 

心跳实际上是一个很好的方法来做到这一点。问题是,你有假阳性,但这是不可避免的,在同一个局域网的集群,管理负载,他们是准确的。 Paxos的好处是自动处理假阳性。然而,如果你实际上需要故障信息为某些其他目的,那么你需要确保你检测到一个节点失败,但它实际上是在负载和花费时间来响应心跳是确定。

  *节点应该如何确定在故障发生时,哪些数据库条目在故障节点上具有其主副本,其他节点可以恢复这些条目? 
*如何确定哪些节点具有某个条目的最新辅助副本?
*如何确定应将哪个节点的辅助副本提升为新的主副本?

我想你可能真正从阅读Google FileSystem文章中受益。在GFS中有一个专用的主节点,它跟踪哪些节点有哪些块。这个方案可能为你工作,但关键是保持对这个主的访问最小。



如果您不将此信息存储在专用节点上,则必须将其存储在任何位置。尝试使用主持有人的ID标记数据。

  *如何处理它,突然回来,好像什么也没发生? 

看上面,但基本点是你必须小心,因为一个节点不再主人可能会认为它是。有一件事,我不认为你解决了:一个更新如何得到主 - 即客户端如何知道哪个节点发送更新?

  *如何避免裂脑情况,其中网络暂时分为两个,双方认为另一方已经死了? 

Paxos在这里通过防止在完美分割的情况下的进步工作。否则,像以前一样,你必须非常小心。



一般情况下,解决知道哪个节点获取哪个数据项作为主节点的问题,您将很难修复您的架构。注意,你不能只有接收更新的节点是主 - 如果两个更新并发发生了什么?不要依赖一个同步的全局时钟 - 这种疯狂的谎言。您可能希望避免在每次写入时达成共识,如果您可以帮助它,那么可能有一个缓慢的主 - 故障转移协议和一个快速的写入路径。



随意如果你想知道更多的细节,我离开线路打我。我的博客 http://the-paper-trail.org 处理了大量的这些内容。 p>

干杯,



Henry


I'm planning on making a distributed database system using a shared-nothing architecture and multiversion concurrency control. Redundancy will be achieved through asynchronous replication (it's allowed to lose some recent changes in case of a failure, as long as the data in the system remains consistent). For each database entry, one node has the master copy (only that node has write access to it), in addition to which one or more nodes have secondary copies of the entry for scalability and redundancy purposes (the secondary copies are read-only). When the master copy of an entry is updated, it is timestamped and sent asynchronously to nodes with secondary copies so that finally they will get the latest version of the entry. The node that has the master copy can change at any time - if another node needs to write that entry, it will request the current owner of the master copy to give that node the ownership of that entry's master copy, and after receiving ownership that node can write the entry (all transactions and writes are local).

Lately I've been thinking about what to do when a node in the cluster goes down, that what strategy to use for failover. Here are some questions. I hope that you would know available alternatives to at least some of them.

  • What algorithms there are for doing failover in a distributed system?
  • What algorithms there are for consensus in a distributed system?
  • How should the nodes in the cluster determine that a node is down?
  • How should the nodes determine that what database entries had their master copy on the failed node at the time of failure, so that other nodes may recover those entries?
  • How to decide that which node(s) has the latest secondary copy of some entry?
  • How to decide that which node's secondary copy should be promoted to be the new master copy?
  • How to handle it, if the node which was though to be down, suddenly comes back as if nothing happened?
  • How to avoid split-brain scenarios, where the network is temporarily split into two, and both sides think that the other side has died?

解决方案

* What algorithms there are for doing failover in a distributed system?

Possibly not algorithms, so much as systems. You need to design your architecture around the questions you've asked.

* What algorithms there are for consensus in a distributed system?

You probably want to implement Paxos. Simple Paxos is not too hard to get right. If you're are trying to make it bullet proof, read Google's 'Paxos Made Live' paper. If you're hoping to make it high-performance, look at Multi-Paxos.

* How should the nodes in the cluster determine that a node is down?

Depends. Heartbeats are actually a pretty good way to do this. The problem is that you have false positives, but that's kind of unavoidable, and in a cluster on the same LAN with manageable load they're accurate. The good thing about Paxos is that false positives are dealt with automatically. However, if you actually need failure information for some other purpose then you need to make sure it's ok that you detect a node as failed, but it actually is just under load and taking time to respond to a heartbeat.

* How should the nodes determine that what database entries had their master copy on the failed node at the time of failure, so that other nodes may recover those entries?
* How to decide that which node(s) has the latest secondary copy of some entry?
* How to decide that which node's secondary copy should be promoted to be the new master copy?

I think you might really benefit from reading the Google FileSystem paper. In GFS there's a dedicated master node which keeps track of which nodes have which blocks. This scheme might work for you, but the key is to keep accesses to this master minimal.

If you don't store this information on a dedicated node, you're going to have to store it everywhere. Try tagging the data with the master holder's id.

* How to handle it, if the node which was though to be down, suddenly comes back as if nothing happened?

See above, but the basic point is that you have to be careful because a node that is no longer the master might think that it is. One thing that I don't think you've solved: how does an update get to the master - i.e. how does a client know which node to send the update to?

* How to avoid split-brain scenarios, where the network is temporarily split into two, and both sides think that the other side has died?

Paxos works here by preventing progress in the case of a perfect split. Otherwise, as before, you have to be very careful.

In general, solve the question of knowing which node gets which data item as the master, and you'll be a long way towards fixing your architecture. Note that you can't just have the node receiving the update be the master - what if two updates happen concurrently? Don't rely on a synchronised global clock either - that way madness lies. You probably want to avoid running consensus on every write if you can help it, so instead perhaps have a slow master-failover protocol and a fast write path.

Feel free to shoot me a mail off line if you want to know more details. My blog http://the-paper-trail.org deals with a lot of this stuff.

cheers,

Henry

这篇关于在分布式系统中有什么算法用于故障转移?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆