如何在 ArangoDB 中设置集群和分片? [英] How to set clusters and sharding in ArangoDB?

查看:70
本文介绍了如何在 ArangoDB 中设置集群和分片?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在 arangoDB 中使用分片.我已经制作了协调器,如文档 2.8.5 中提到的 DBServers.但是仍然有人仍然可以详细解释它,以及我如何能够在分片前后检查查询的性能.

I want to use sharding in arangoDB.I have made coordinators, DBServers as mentioned in documentation 2.8.5. But still can someone still explain it in details and also how can I able to check the performance of my query after and before sharding.

推荐答案

可以测试您的应用程序 使用本地集群,所有实例都在一台机器上运行 - 如果我没猜错的话,您已经这样做了吗?

Testing your application can be done with a local cluster, were all instances run on one machine - which is what you already did, if I get that correctly?

ArangoDB 集群由协调器和数据库服务器节点组成.协调器在磁盘上没有自己的用户特定的本地集合.它们的作用是处理与客户端的 I/O,解析、优化和分发查询和用户数据到 dbserver 节点.Foxx 服务也将在协调器上运行.DBServers 是这个设置中的存储节点,它们保存用户数据.

An ArangoDB cluster consists of coordinator and dbserver nodes. Coordinators don't have own user specific local collections on disk. Their role is to handle the I/O with the clients, parse, optimize and distribute the queries and the user data to the dbserver nodes. Foxx services will also be run on the coordinators. DBServers are the storage nodes in this setup, they keep the user data.

要比较集群模式和非集群模式之间的性能,您可以在集群实例和非集群实例上导入数据集,然后比较查询结果时间.由于集群设置可以比单服务器情况有更多的网络通信(即如果你加入),性能可能会有所不同.在一个物理分布式集群可以实现更高的吞吐量,因为在集群节点是自己的机器,有自己的 IO 路径,在单独的物理硬盘上结束.

To compare the performance between clustered and non clustered mode you import a dataset on a clustered instance and a non clustered one and compare the query result times. Since the cluster setup can have more network communication (i.e. if you do a join) than the single server case, the performance can be different. On a physically distributed cluster you may achieve higher throughput, since in the end the cluster nodes are own machines and have their own IO paths that end on separate physical harddisks.

在集群情况下,您创建指定分片数量的集合numberOfShards 参数;shardKeys 参数可以控制文档在分片之间的分布.您应该选择该密钥,以便文档在分片中分布良好(即不会不平衡到只有一个分片).numberOfShards 可以是任意值,不必对应于 dbserver 节点的数量 - 它甚至可以更大,因此您可以在扩展时更轻松地将分片从一个 dbserver 移动到新的 dbserver将来将您的集群升级到更多节点以适应更高的负载.

In the cluster case you create collections specifying the number of shards using the numberOfShards parameter; the shardKeys parameter can control the distribution of your documents across the shards. You should choose that key so documents distribute well across the shards (i.e. are not inbalanced to just one shard). The numberOfShards can be an arbitrary value and doesn't have to corrospond to the number of dbserver nodes - it could even be bigger so you can more easily move a shard from one dbserver to a new dbserver when scaling up your cluster to more nodes in the future to adapt to higher loads.

当您在考虑使用集群的情况下开发 AQL 查询时,必须使用 explain 命令 检查查询在集群中的分布情况,以及过滤器可以部署的位置:

When you're developping AQL queries with cluster use in mind, its essential to use the explain command to inspect how the query is distributed across the clusters, and where filters can be deployed:

db._create("sharded", {numberOfShards: 2})
db._explain("FOR x IN sharded RETURN x")
Query string:
 FOR x IN sharded RETURN x

Execution plan:
 Id   NodeType                  Est.   Comment
  1   SingletonNode                1   * ROOT
  2   EnumerateCollectionNode      1     - FOR x IN sharded /* full collection scan */
  6   RemoteNode                   1       - REMOTE
  7   GatherNode                   1       - GATHER
  3   ReturnNode                   1       - RETURN x

Indexes used:
 none

Optimization rules applied:
 Id   RuleName
  1   scatter-in-cluster
  2   remove-unnecessary-remote-scatter

在这个简单的查询中,RETURN &GATHER - 节点在协调器上;包括 REMOTE 节点在内的节点被部署到数据库服务器.

In this simple query the RETURN & GATHER -nodes are on the coordinator; the nodes upwards including the REMOTE-node are deployed to the DB-server.

一般来说,更少的 REMOTE/SCATTER -> GATHER 对意味着更少的集群通信.FILTER 节点越近可以部署到 *CollectionNodes 以减少要通过 REMOTE 节点发送的文档数量,性能就越好.

In general less REMOTE / SCATTER -> GATHER pairs means less cluster communication. The closer FILTER nodes can be deployed to *CollectionNodes to reduce the amount of the documents to be sent via the REMOTE-nodes the better the performance.

这篇关于如何在 ArangoDB 中设置集群和分片?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆