如何在ArangoDB中设置集群和分片? [英] How to set clusters and sharding in ArangoDB?
问题描述
我想在arangoDB中使用分片.我已经创建了协调器,如文档2.8.5中提到的DBServers.但是仍然有人可以对其进行详细说明,以及如何在分片之后和之前检查查询的性能.
I want to use sharding in arangoDB.I have made coordinators, DBServers as mentioned in documentation 2.8.5. But still can someone still explain it in details and also how can I able to check the performance of my query after and before sharding.
推荐答案
可以完成应用程序测试使用本地群集,所有实例是否都在一台计算机上运行-如果我正确地做到了,这就是您已经做过的事情?
Testing your application can be done with a local cluster, were all instances run on one machine - which is what you already did, if I get that correctly?
ArangoDB集群由协调器节点和dbserver节点组成.协调器在磁盘上没有自己的用户特定的本地集合.它们的作用是与客户端一起处理I/O,解析,优化查询并将用户和用户数据分发到dbserver节点. Foxx服务也将在协调器上运行. DBServer是此设置中的存储节点,它们保留用户数据.
An ArangoDB cluster consists of coordinator and dbserver nodes. Coordinators don't have own user specific local collections on disk. Their role is to handle the I/O with the clients, parse, optimize and distribute the queries and the user data to the dbserver nodes. Foxx services will also be run on the coordinators. DBServers are the storage nodes in this setup, they keep the user data.
要比较集群模式和非集群模式的性能,可以在集群实例和非集群实例上导入数据集,并比较查询结果时间.由于群集设置可以比单服务器情况下具有更多的网络通信(即,如果您进行连接),因此性能可能会有所不同.在 物理分布的群集,您可以实现更高的吞吐量,因为在集群节点本身就是一台机器,它们的IO路径分别在单独的物理硬盘上结束.
To compare the performance between clustered and non clustered mode you import a dataset on a clustered instance and a non clustered one and compare the query result times. Since the cluster setup can have more network communication (i.e. if you do a join) than the single server case, the performance can be different. On a physically distributed cluster you may achieve higher throughput, since in the end the cluster nodes are own machines and have their own IO paths that end on separate physical harddisks.
In the cluster case you create collections specifying the number of shards using the numberOfShards
parameter; the shardKeys
parameter can control the distribution of your documents across the shards. You should choose that key so documents distribute well across the shards (i.e. are not inbalanced to just one shard). The numberOfShards
can be an arbitrary value and doesn't have to corrospond to the number of dbserver nodes - it could even be bigger so you can more easily move a shard from one dbserver to a new dbserver when scaling up your cluster to more nodes in the future to adapt to higher loads.
在考虑群集使用的情况下开发AQL查询时,必须使用 explain命令检查查询如何在群集中分布以及可以在何处部署过滤器:
When you're developping AQL queries with cluster use in mind, its essential to use the explain command to inspect how the query is distributed across the clusters, and where filters can be deployed:
db._create("sharded", {numberOfShards: 2})
db._explain("FOR x IN sharded RETURN x")
Query string:
FOR x IN sharded RETURN x
Execution plan:
Id NodeType Est. Comment
1 SingletonNode 1 * ROOT
2 EnumerateCollectionNode 1 - FOR x IN sharded /* full collection scan */
6 RemoteNode 1 - REMOTE
7 GatherNode 1 - GATHER
3 ReturnNode 1 - RETURN x
Indexes used:
none
Optimization rules applied:
Id RuleName
1 scatter-in-cluster
2 remove-unnecessary-remote-scatter
在此简单查询中,RETURN
& GATHER
-节点在协调器上;包括REMOTE
节点在内的所有向上节点均已部署到数据库服务器.
In this simple query the RETURN
& GATHER
-nodes are on the coordinator; the nodes upwards including the REMOTE
-node are deployed to the DB-server.
通常,更少的REMOTE
/SCATTER
-> GATHER
对意味着更少的集群通信.可以将更靠近FILTER
的节点部署到*CollectionNodes
,以减少通过REMOTE
-节点发送的文档数量,从而提高性能.
In general less REMOTE
/ SCATTER
-> GATHER
pairs means less cluster communication. The closer FILTER
nodes can be deployed to *CollectionNodes
to reduce the amount of the documents to be sent via the REMOTE
-nodes the better the performance.
这篇关于如何在ArangoDB中设置集群和分片?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!