如何在ArangoDB中设置集群和分片? [英] How to set clusters and sharding in ArangoDB?

查看:434
本文介绍了如何在ArangoDB中设置集群和分片?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在arangoDB中使用分片.我已经创建了协调器,如文档2.8.5中提到的DBServers.但是仍然有人可以对其进行详细说明,以及如何在分片之后和之前检查查询的性能.

I want to use sharding in arangoDB.I have made coordinators, DBServers as mentioned in documentation 2.8.5. But still can someone still explain it in details and also how can I able to check the performance of my query after and before sharding.

推荐答案

可以完成应用程序测试使用本地群集,所有实例是否都在一台计算机上运行-如果我正确地做到了,这就是您已经做过的事情?

Testing your application can be done with a local cluster, were all instances run on one machine - which is what you already did, if I get that correctly?

ArangoDB集群由协调器节点和dbserver节点组成.协调器在磁盘上没有自己的用户特定的本地集合.它们的作用是与客户端一起处理I/O,解析,优化查询并将用户和用户数据分发到dbserver节点. Foxx服务也将在协调器上运行. DBServer是此设置中的存储节点,它们保留用户数据.

An ArangoDB cluster consists of coordinator and dbserver nodes. Coordinators don't have own user specific local collections on disk. Their role is to handle the I/O with the clients, parse, optimize and distribute the queries and the user data to the dbserver nodes. Foxx services will also be run on the coordinators. DBServers are the storage nodes in this setup, they keep the user data.

要比较集群模式和非集群模式的性能,可以在集群实例和非集群实例上导入数据集,并比较查询结果时间.由于群集设置可以比单服务器情况下具有更多的网络通信(即,如果您进行连接),因此性能可能会有所不同.在 物理分布的群集,您可以实现更高的吞吐量,因为在集群节点本身就是一台机器,它们的IO路径分别在单独的物理硬盘上结束.

To compare the performance between clustered and non clustered mode you import a dataset on a clustered instance and a non clustered one and compare the query result times. Since the cluster setup can have more network communication (i.e. if you do a join) than the single server case, the performance can be different. On a physically distributed cluster you may achieve higher throughput, since in the end the cluster nodes are own machines and have their own IO paths that end on separate physical harddisks.

在集群情况下,您使用numberOfShards参数; shardKeys参数可以控制文档在各个分片之间的分布.您应该选择该密钥,以使文档在各个分片之间分布良好(即,不仅仅限于一个分片). numberOfShards可以是任意值,而不必与dbserver节点的数量相对应-它甚至可以更大,因此在将集群扩展到更多时,可以更容易地将一个碎片从一个dbserver迁移到新的dbserver.未来的节点以适应更高的负载.

In the cluster case you create collections specifying the number of shards using the numberOfShards parameter; the shardKeys parameter can control the distribution of your documents across the shards. You should choose that key so documents distribute well across the shards (i.e. are not inbalanced to just one shard). The numberOfShards can be an arbitrary value and doesn't have to corrospond to the number of dbserver nodes - it could even be bigger so you can more easily move a shard from one dbserver to a new dbserver when scaling up your cluster to more nodes in the future to adapt to higher loads.

在考虑群集使用的情况下开发AQL查询时,必须使用 explain命令检查查询如何在群集中分布以及可以在何处部署过滤器:

When you're developping AQL queries with cluster use in mind, its essential to use the explain command to inspect how the query is distributed across the clusters, and where filters can be deployed:

db._create("sharded", {numberOfShards: 2})
db._explain("FOR x IN sharded RETURN x")
Query string:
 FOR x IN sharded RETURN x

Execution plan:
 Id   NodeType                  Est.   Comment
  1   SingletonNode                1   * ROOT
  2   EnumerateCollectionNode      1     - FOR x IN sharded /* full collection scan */
  6   RemoteNode                   1       - REMOTE
  7   GatherNode                   1       - GATHER
  3   ReturnNode                   1       - RETURN x

Indexes used:
 none

Optimization rules applied:
 Id   RuleName
  1   scatter-in-cluster
  2   remove-unnecessary-remote-scatter

在此简单查询中,RETURN& GATHER-节点在协调器上;包括REMOTE节点在内的所有向上节点均已部署到数据库服务器.

In this simple query the RETURN & GATHER -nodes are on the coordinator; the nodes upwards including the REMOTE-node are deployed to the DB-server.

通常,更少的REMOTE/SCATTER-> GATHER对意味着更少的集群通信.可以将更靠近FILTER的节点部署到*CollectionNodes,以减少通过REMOTE-节点发送的文档数量,从而提高性能.

In general less REMOTE / SCATTER -> GATHER pairs means less cluster communication. The closer FILTER nodes can be deployed to *CollectionNodes to reduce the amount of the documents to be sent via the REMOTE-nodes the better the performance.

这篇关于如何在ArangoDB中设置集群和分片?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆