如何实现MongoDB(或CouchDB)到许多远程客户端的实时复制 [英] How to implement real-time replication of MongoDB (or CouchDB) to many remote clients
问题描述
我正在考虑如何设计一个机制,以一次复制(潜在大)MongoDB或其他NoSQL(CouchDB等)数据库到几十个客户端。客户端将像复制集一样运行,但复制将是单向的,并且远程客户端将属于其他方。具体来说,我正在寻找以下功能:
- 实时:对主数据库的更改 :新客户端必须能够连接,自动同步大多数现有数据,然后接收实时更新。
- 高效:初始同步/传输数据和跟踪实时更新(diffs,如果愿意) 安全:主数据库提供了一个接口,远程客户端(不属于同一所有者或系统) )可以连接:即,我们不能只将所有客户端添加到主控副本集。
- 鲁棒:客户端和主数据库之间的临时连接失败
在某种意义上,服务器发布集合的数据,客户端订阅。我意识到这是一个硬的软件工程问题,据我所知,没有软件已经实现了这一点。
-
Meteor的DDP通讯协定 :这是为了与Mongo-like集合并且精确地实现发布和订阅一组数据(而不是消息流)的模型。它管理初始同步并发送实时更改。然而,它仍然在开发中,远不是一个工业强度解决方案 - 当前的缺点是服务器以可能无效的方式保存每个客户端状态的副本,并且仅在可以适合于web存储器的集合上测试app。此外,似乎DDP 无法有效地同步过期数据库,而无法从头获取所有内容。如果任何人都可以指出一些例子,如何大量的集合可以通过DDP同步,这将是巨大的。 (另请参阅: Meteor的DDP发布/订阅协议上的文档或代码详情)
-
广播 Mongo oplog < a> :使用高吞吐量消息总线(如 Apache Kafka ),可以有效地立即向许多客户端发送oplog。这解决了一些系统实现的挑战。然而,这需要客户端开始一个初始同步,使它们能够接近当前主控状态,然后从适当的点开始重放oplog。
- p> 连续复制la CouchDB :我不是确保这是如何实现和它是多么鲁棒,鉴于文档的稀疏性。但是,它似乎工作在远程数据库连接。 但是,当多个客户端尝试同时复制时,效率如何?(类似的做法是让客户端MongoDB Priority 0 replica set members ;但是,这似乎远离其预期用途另见: http://guide.couchdb.org/draft/replication.html )
请指出已经实现其某些部分的软件或软件,或者需要的算法/数据结构建议
如果您专门寻找实时复制,我建议您仔细研究SaaS产品为此目的,例如 https://www.firebase.com/
I'm considering how to design a mechanism for replicating a (potentially large) MongoDB or other NoSQL (CouchDB, etc) database to dozens of clients at once. The clients would function like a replica set, but the replication would be one-way and the remote clients would belong to other parties. Specifically, I am looking for the following features:
- real-time: changes to the master database should be pushed out to the clients as quickly as possible
- replication to new clients: a new client must be able to connect, automatically sync the majority of existing data, then receive real-time updates.
- efficient: both the initial synchronization/transfer of data and tracking of real-time updates ("diffs", if you will) are computationally efficient, with multiple clients connected.
- secure: the master database presents an interface to which remote clients (who do not belong to the same owner or system) can connect: i.e., we cannot just add all the clients to the master's replica set.
- robust: a temporarily connection failure between a client and the master database should be easily and efficiently recoverable.
In some sense, the server is publishing a collection of data and the clients are subscribing to it. I realize that this is a hard software engineering problem, and to my knowledge no piece of software has implemented this exactly yet. However, some approaches have come to mind as close, which I'll list below.
Meteor's DDP protocol: It's designed to do this with Mongo-like collections and exactly implements the model of publishing and subscribing to a set of data (rather than a stream of messages). It manages the initial sync and sends along live changes. However, it's still in development, and far from being an industrial-strength solutions - current drawbacks are that the server keeps a copy of every client's state in a possibly inefficient way and is only tested on collections that can fit in the memory of a web app. Also, it appears that DDP cannot efficiently sync an out-of-date database without fetching everything from scratch. If anyone can point to some examples of how large of a collection can be synced over DDP, that would be great. (See also: Documentation or code details on Meteor's DDP pub/sub protocol?)
Broadcasting the Mongo oplog: Using a high-throughput message bus like Apache Kafka, one may be able to efficiently send the oplog to many clients at once. This tackles some of the system implementation challenges. However, this requires that the clients start with an initial sync that gets them close enough to the current master state somehow and then start replaying the oplog from the appropriate point.
Continuous replication a la CouchDB: I'm not sure how this is implemented and how robust it is, given the sparsity of the documentation. However, it does seem to work over remote database connections. How efficient is this, though, when multiple clients are trying to replicate at the same time? (A similar hack to this would be to make the clients MongoDB Priority 0 replica set members; however, that seems to be far from its intended use. See also: http://guide.couchdb.org/draft/replication.html)
Please give pointers to software or pieces of software that already implement parts of this, or suggestions on the algorithms/data structures needed to do this efficiently.
If you are looking specifically for real-time replication, I'd recommend you look into SaaS offerings specifically for this purpose, such as https://www.firebase.com/
这篇关于如何实现MongoDB(或CouchDB)到许多远程客户端的实时复制的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!