Meteor 在众多客户之间共享大量收藏时的效率如何? [英] How efficient can Meteor be while sharing a huge collection among many clients?

查看:23
本文介绍了Meteor 在众多客户之间共享大量收藏时的效率如何?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

想象以下情况:

  • 1,000 个客户端连接到 Meteor 页面,显示Somestuff"集合的内容.

  • 1,000 clients are connected to a Meteor page displaying the content of the "Somestuff" collection.

Somestuff"是一个包含 1,000 件物品的集合.

"Somestuff" is a collection holding 1,000 items.

有人在Somestuff"集合中插入了一个新项目

Someone inserts a new item into the "Somestuff" collection

会发生什么:

  • 客户端上的所有 Meteor.Collection 都将更新,即将插入转发给所有客户端(这意味着将一条插入消息发送给 1,000 个客户端)
  • All Meteor.Collections on clients will be updated i.e. the insertion forwarded to all of them (which means one insertion message sent to 1,000 clients)

服务器确定需要更新哪个客户端的 CPU 成本是多少?

What is the cost in term of CPU for the server to determine which client needs to be updated?

是否只有插入的值才会转发给客户端,而不是整个列表?

Is it accurate that only the inserted value will be forwarded to the clients, and not the whole list?

这在现实生活中如何运作?是否有任何可用的此类规模的基准或实验?

How does this work in real life? Are there any benchmarks or experiments of such scale available?

推荐答案

简而言之,只有新数据才能通过网络发送.这是它是如何工作的.

The short answer is that only new data gets sent down the wire. Here's how it works.

Meteor 服务器有三个重要的部分来管理订阅:发布功能,它定义了什么的逻辑订阅提供的数据;Mongo 驱动程序,它监视更改数据库;和合并框,它结合了所有客户端的活动订阅并将它们通过网络发送到客户.

There are three important parts of the Meteor server that manage subscriptions: the publish function, which defines the logic for what data the subscription provides; the Mongo driver, which watches the database for changes; and the merge box, which combines all of a client's active subscriptions and sends them out over the network to the client.

每当 Meteor 客户端订阅一个集合时,服务器都会运行一个发布功能.发布函数的工作是找出集合其客户应拥有的文档并发送每个文档属性进入合并框.它为每个新订阅客户端运行一次.你可以在发布函数中放入任何你想要的 JavaScript,比如使用 this.userId 进行任意复杂的访问控制.发布函数通过调用this. addedthis.changedthis.removed.见完整发布文档更多详情.

Each time a Meteor client subscribes to a collection, the server runs a publish function. The publish function's job is to figure out the set of documents that its client should have and send each document property into the merge box. It runs once for each new subscribing client. You can put any JavaScript you want in the publish function, such as arbitrarily complex access control using this.userId. The publish function sends data into the merge box by calling this.added, this.changed and this.removed. See the full publish documentation for more details.

大多数发布功能不必处理低级不过, addedchangedremoved API.如果发布函数返回一个 Mongocursor,流星服务器自动连接Mongo的输出驱动程序(insertupdateremoved 回调)到合并框(this. addedthis.changedthis.removed).它很整洁您可以在发布功能中预先进行所有权限检查,并且然后直接将数据库驱动连接到合并框,无需任何用户代码的方式.当自动发布打开时,即使是这一点hidden:服务器自动为每个文档中的所有文档设置一个查询集合并将它们推送到合并框中.

Most publish functions don't have to muck around with the low-level added, changed and removed API, though. If a publish function returns a Mongo cursor, the Meteor server automatically connects the output of the Mongo driver (insert, update, and removed callbacks) to the input of the merge box (this.added, this.changed and this.removed). It's pretty neat that you can do all the permission checks up front in a publish function and then directly connect the database driver to the merge box without any user code in the way. And when autopublish is turned on, even this little bit is hidden: the server automatically sets up a query for all documents in each collection and pushes them into the merge box.

另一方面,您不仅限于发布数据库查询.例如,您可以编写一个读取 GPS 位置的发布函数从 Meteor.setInterval 内的设备,或轮询旧的 REST API来自另一个网络服务.在这些情况下,您将更改通过调用低级 addedchangedremoved DDP API 来合并框.

On the other hand, you aren't limited to publishing database queries. For example, you can write a publish function that reads a GPS position from a device inside a Meteor.setInterval, or polls a legacy REST API from another web service. In those cases, you'd emit changes to the merge box by calling the low-level added, changed and removed DDP API.

Mongo 驱动程序 的工作是观察 Mongo 数据库中的变化实时查询.这些查询持续运行并返回更新作为结果通过调用 addedremovedchanged 回调而改变.

The Mongo driver's job is to watch the Mongo database for changes to live queries. These queries run continuously and return updates as the results change by calling added, removed, and changed callbacks.

Mongo 不是实时数据库.所以司机投票.它保持一个每个活动实时查询的最后一个查询结果的内存副本.在每个轮询周期,它将新结果与之前保存的结果进行比较结果,计算 addedremovedchanged的最小集合描述差异的事件.如果多个呼叫者注册相同实时查询的回调,驱动程序只观看一份查询,以相同的结果调用每个注册的回调.

Mongo is not a real time database. So the driver polls. It keeps an in-memory copy of the last query result for each active live query. On each polling cycle, it compares the new result with the previous saved result, computing the minimum set of added, removed, and changed events that describe the difference. If multiple callers register callbacks for the same live query, the driver only watches one copy of the query, calling each registered callback with the same result.

每次服务器更新一个集合时,驱动程序都会重新计算每个对该集合的实时查询(Meteor 的未来版本将公开一个用于限制哪些实时查询在更新时重新计算的扩展 API.)驱动程序还会在 10 秒的计时器上轮询每个实时查询以捕获绕过 Meteor 服务器的带外数据库更新.

Each time the server updates a collection, the driver recalculates each live query on that collection (Future versions of Meteor will expose a scaling API for limiting which live queries recalculate on update.) The driver also polls each live query on a 10 second timer to catch out-of-band database updates that bypassed the Meteor server.

merge box 的工作是合并结果( addedchangedremoved调用)所有客户端的活动发布功能到单个数据中溪流.每个连接的客户端都有一个合并框.它拥有一个客户端 minimongo 缓存的完整副本.

The job of the merge box is to combine the results (added, changed and removed calls) of all of a client's active publish functions into a single data stream. There is one merge box for each connected client. It holds a complete copy of the client's minimongo cache.

在您只有一个订阅的示例中,合并框是本质上是一种传递.但是一个更复杂的应用程序可以有多个可能重叠的订阅.如果两个订阅都设置了同一个文档上的相同属性,合并框决定哪个值优先并且只将它发送给客户端.我们没有曝光用于设置订阅优先级的 API.目前,优先事项是由客户端订阅数据集的顺序决定.首先客户端订阅的优先级最高,第二个订阅次之,依此类推.

In your example with just a single subscription, the merge box is essentially a pass-through. But a more complex app can have multiple subscriptions which might overlap. If two subscriptions both set the same attribute on the same document, the merge box decides which value takes priority and only sends that to the client. We haven't exposed the API for setting subscription priority yet. For now, priority is determined by the order the client subscribes to data sets. The first subscription a client makes has the highest priority, the second subscription is next highest, and so on.

因为合并框持有客户端的状态,所以可以发送最少的保持每个客户端最新的数据量,无论发布什么函数供给它.

Because the merge box holds the client's state, it can send the minimum amount of data to keep each client up to date, no matter what a publish function feeds it.

现在我们已经为您的场景做好了准备.

So now we've set the stage for your scenario.

我们有 1,000 个连接的客户端.每个人都订阅了同一个直播Mongo 查询 (Somestuff.find({})).由于每个客户端的查询都相同,因此驱动程序是只运行一个实时查询.有 1,000 个活动合并框.和每个客户端的发布函数都注册了一个 addedchanged,以及removed 在进入合并框之一的实时查询上.没有其他任何东西连接到合并框.

We have 1,000 connected clients. Each is subscribed to the same live Mongo query (Somestuff.find({})). Since the query is the same for each client, the driver is only running one live query. There are 1,000 active merge boxes. And each client's publish function registered an added, changed, and removed on that live query that feeds into one of the merge boxes. Nothing else is connected to the merge boxes.

首先是 Mongo 驱动程序.当其中一个客户端插入新文档时进入Somestuff,它会触发重新计算.Mongo 驱动程序重新运行Somestuff 中所有文档的查询,将结果与内存中的先前结果,发现有一个新文档,并且调用 1,000 个已注册的 insert 回调中的每一个.

First the Mongo driver. When one of the clients inserts a new document into Somestuff, it triggers a recomputation. The Mongo driver reruns the query for all documents in Somestuff, compares the result to the previous result in memory, finds that there is one new document, and calls each of the 1,000 registered insert callbacks.

接下来是发布功能.这里发生的事情很少:每个在 1,000 个 insert 回调中,通过以下方式将数据推入合并框调用已添加.

Next, the publish functions. There's very little happening here: each of the 1,000 insert callbacks pushes data into the merge box by calling added.

最后,每个合并框对照它的新属性检查这些新属性其客户端缓存的内存副本.在每种情况下,它发现值尚未在客户端上,并且不会影响现有值.所以合并框在 SockJS 连接上发出一条 DDP DATA 消息到它的客户端并更新其服务器端内存副本.

Finally, each merge box checks these new attributes against its in-memory copy of its client's cache. In each case, it finds that the values aren't yet on the client and don't shadow an existing value. So the merge box emits a DDP DATA message on the SockJS connection to its client and updates its server-side in-memory copy.

CPU 总成本是 diff 一个 Mongo 查询的成本,加上1,000 个合并框检查其客户的状态并构建一个新的DDP 消息负载.唯一通过线路流动的数据是单个JSON 对象发送到 1,000 个客户端中的每一个,对应于新的数据库中的文档,外加一条 RPC 消息到服务器进行原始插入的客户.

Total CPU cost is the cost to diff one Mongo query, plus the cost of 1,000 merge boxes checking their clients' state and constructing a new DDP message payload. The only data that flows over the wire is a single JSON object sent to each of the 1,000 clients, corresponding to the new document in the database, plus one RPC message to the server from the client that made the original insert.

这就是我们明确的计划.

Here's what we definitely have planned.

  • 更高效的 Mongo 驱动程序.我们优化驱动在 0.5.1 中,每个不同的查询只运行一个观察者.

  • More efficient Mongo driver. We optimized the driver in 0.5.1 to only run a single observer per distinct query.

并非每个数据库更改都应触发查询的重新计算.我们可以进行一些自动化改进,但最好的方法是 API这让开发人员可以指定需要重新运行的查询.为了例如,对于开发人员来说,将消息插入到一个聊天室不应使对消息的实时查询无效第二个房间.

Not every DB change should trigger a recomputation of a query. We can make some automated improvements, but the best approach is an API that lets the developer specify which queries need to rerun. For example, it's obvious to a developer that inserting a message into one chatroom should not invalidate a live query for the messages in a second room.

Mongo 驱动、发布函数和合并框不需要运行在同一进程中,甚至在同一台机器上.一些应用运行复杂的实时查询并且需要更多的 CPU 来观察数据库.其他人只有几个不同的查询(想象一个博客引擎),但是可能有很多连接的客户端——这些需要更多的 CPU 来合并盒子.分离这些组件将让我们缩放每个部分独立.

The Mongo driver, publish function, and merge box don't need to run in the same process, or even on the same machine. Some applications run complex live queries and need more CPU to watch the database. Others have only a few distinct queries (imagine a blog engine), but possibly many connected clients -- these need more CPU for merge boxes. Separating these components will let us scale each piece independently.

许多数据库支持在更新行时触发的触发器,并且提供旧行和新行.有了这个特性,一个数据库驱动程序可以注册触发器而不是轮询更改.

Many databases support triggers that fire when a row is updated and provide the old and new rows. With that feature, a database driver could register a trigger instead of polling for changes.

这篇关于Meteor 在众多客户之间共享大量收藏时的效率如何?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆