使用Paxos跨节点同步大文件 [英] Using Paxos to synchronize a large file across nodes

查看:92
本文介绍了使用Paxos跨节点同步大文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用Paxos来维护文件大小约为50MB的节点之间的共识,并在各个节点上不断对其进行修改。我遇到了实用性问题。
要求:


  1. 跨数百个节点同步50MB以上的文件

  2. 已将更改更改为该文件最多可以在几秒钟内通过网络传播,该文件可以从任何节点创建,并且彼此之间不可能直接竞争。

  3. 加入网络的新节点可以在几分钟之内(不到1小时)通过跟随Paxos消息来构建整个文件

我遇到的问题我面临的问题是似乎没有办法同时实现目标2和目标3。



以下是我到目前为止考虑过的选项:




  • 每轮同步整个文件-完全不切实际,Paxos轮次需要几分钟

  • 仅将更改同步到文件-对于目标1和2是合理的,但违反了目标3,因为新的节点只有在每个状态单元都更改后才能够同步整个文件

  • 同步更改和每轮随机生成一个文件-我不确定Paxos是否允许这样做。节点将能够根据自己的变更来验证更改(允许进行新的更改),并且能够根据文件的所述版本来验证文件的随机部分,但这可行吗?



我认为第三个选择是最好的,但是我不确定Paxos是否允许这样做。这样做的想法是将每轮交换的数据限制为1500个字节,并主要用对该文件的更改来填充1500个字节。在大多数回合中,文件将保持不变,并且其中发生更改的回合很可能少于更改状态的100字节,因此其他1400字节可以用文件的一部分填充,这将允许构建新节点随着时间的推移整理整个文件。这可行吗?这个问题已经解决了吗?

解决方案

正如评论中提到的彼得所说,最终一致可能更合适。这是一种基于 gossip协议的算法。



每个节点都有该文件的某些版本。每个节点每N秒连接到另一个节点,它们交换版本号。如果一个节点位于另一个节点之后,它将从对等节点下载文件。



这个收敛速度非常快,我认为在1000节点的10到20次八卦回合中。 >




更新



(将木筏或消息传递队列引入





从您的评论看来,您有一把钥匙物超所值。您可以将其视为分布式状态机,在其中将对每个密钥的更新视为其自己的命令。这对于像paxos或raft这样的共识协议非常有用(我现在偏爱 raft 开源实现的数量)。更重要的是,这些功能通常也被实施为类似于原子广播系统。简而言之,几个节点充当核心决策者,其余节点则听取结果。



(当然,我不知道您的文件如何



主要关注的问题是扇出到数千个节点。为此,您可能需要分层的扇出。有各种各样的方案可以解决这个问题。这里有一些想法。 A)让每个节点连接到两个随机对等体;并从路径最短的对等方流到主节点(这称为两种选择的力量);或B)以 p 的概率选择了路径最短的对等体。 C)让每个节点连接到一个随机对等点,并以一定概率 p ,从该节点流出,再连接到其上游节点。这些概率意在构成一元树,这是连接到主节点的每个节点与链表中的每个节点之间的良好平衡。



消息传递队列



现在,paxos和木筏提供了一些相当有力的保证。专门针对这种情况,将对所有键按顺序处理每个更新。如果不需要保证,则可以构建一个更简单的系统。



每个密钥更新都可以广播到分布式消息传递队列(例如SQS,RabbitMQ等) 。)对每个密钥更新进行版本控制,并且仅在大于您所拥有的版本时才应用更新。



如果系统允许,我会在上面使用筏/ paxos进行这种处理。


I'm trying to use Paxos to maintain consensus between nodes on a file that is around 50MB in size, and constantly being modified at individual nodes. I'm running into issues of practicality. Requirements:

  1. Sync a 50MB+ file across hundreds of nodes
  2. Have changes to this file, which can be made from any node, and aren't likely to directly compete with each other, propagated across the network in a few seconds at most
  3. New nodes that join the network can within a few minutes (<1 hour) build up the entire file by following along with the Paxos messages

The problem I'm facing is that there doesn't seem to be a way to accomplish both goals 2 and 3.

Here are the options I've considered so far:

  • Sync the entire file each round — Completely impractical, Paxos rounds would take minutes
  • Sync only changes to the file — Reasonable for goals 1 and 2, but breaks goal 3, as new nodes would only be able to sync the entire file once every unit of state has been changed
  • Sync changes & a random piece of the file each round — I'm not sure if Paxos allows for this. Nodes would be able to verify the changes against their own (allowing for new changes), and would be able to verify the random piece of the file against said piece of their version, but is this practical?

I'm thinking the third option is best, but I'm not sure if Paxos allows this. The idea would be to limit the data exchanged each round to maybe 1500 bytes, and fill that 1500 bytes with changes to the file primarily. Most rounds, the file would be unchanged, and the rounds in which something changed would most likely be less than 100 bytes of altered state, so the other 1400 bytes could be filled with some piece of the file, which would allow new nodes to build up the entire file over time. Is this practical? Has this problem already been solved?

解决方案

As peter mentioned in the comments, an eventually-consistent is probably a better fit. Here's one such algorithm, based on the gossip protocol.

Every node has some version of the file. Every N seconds each node connects to another node and they swap version numbers. If one node is behind the other it downloads the file from the peer.

This converges remarkably quickly, I think within 10-20 gossip rounds for 1000 nodes.


Update

(Introducing raft or a messaging queue into the mix.)

Raft

From your comments it appears you've got a key-value store on your hands. You can think of it as a distributed state machine, in which you treat an update to each key as its own command. This is great for a consensus protocol like paxos or raft (I'm favoring raft now-a-days for the number of open source implementations). What's more, these are often implemented to also act like an atomic broadcast system. In short, a few nodes act as the core decision makers and the rest of the nodes listen to the results.

(Of course, I don't know how your file is being update; i.e. if it is only updated on a master node and the remainders are slaves.)

One major concern will be the fan-out to 1000s of nodes. For this, you'll probably want a hierarchal fan-out. There are various schemes to help out with this; here are a few ideas. A) Have each node connect to two random peers; and stream from the peer with the shortest path to the master node (this is called the power of two choices); or B) chose the peer with shortest route with some probability p. C) Have each node connect to one random peer and with some probability p, stream from that node else connect to its upstream node instead. These probabilities are meant to make an n-ary tree, which is a good balance between every node connecting to the master node, and every node in a linked list.

Messaging Queue

Now, paxos and raft provide some pretty strong guarantees. Specifically for this case, every update will be processed in order--across all keys. If you don't need that guarantee then you can architect a much simpler system.

Each key update could be broadcast to a distributed messaging queue (like SQS, RabbitMQ, etc.) Version each key update and only apply an update if its greater than the version you have. This presents you with a beautiful and fast eventually consistent system.

I'd go with this approach above using raft/paxos if the system allows for it.

这篇关于使用Paxos跨节点同步大文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆