ArangoDB索引与边缘集合的使用 [英] ArangoDB Index usage with edge collections

查看:282
本文介绍了ArangoDB索引与边缘集合的使用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

任务:更新许多边缘属性的最快方法.出于性能原因,我将忽略图方法,而直接使用collection进行过滤.

Task: Fastest way to update many edges attributes. For performance reasons, I am ignore graph methods and work with collection directly for filtering.

ArangoDB 2.8b3

ArangoDB 2.8b3

查询[报价-边缘收集]:

Query [Offer - edge collection]:

FOR O In Offer
FILTER O._from == @from and O._to == @to and O.expired > DATE_TIMESTAMP(@newoffertime)
UPDATE O WITH { expired: @newoffertime } IN Offer
RETURN { _key: OLD._key, prices_hash: OLD.prices_hash }

我在_to,_from上有系统索引,并且在过期时有范围索引

I have system index on _to, _from and range index on expired

查询说明节目

7   edge   Offer        false    false        49.51 %   [ `_from`, `_to` ]   O.`_to` == "Product/1023058135528"

仅用于过滤部分记录(_to)的系统索引,而不用于两个记录(_from,_to),也未使用过期"索引.请向我解释这种现象的原因,如果我确定在计划数据模型时可以肯定的话,可以指定要用于最短路径的索引提示.

System index used for filtering only part of records (_to), not for both (_from, _to), 'expired' index also not used. Please explain me the reasons for this behavior, and there is a possibility to specify hint of indices to be used for the shortest path, if I know for sure when planning data model?

推荐答案

对于与查询中的逻辑AND相结合的过滤条件,ArangoDB的查询优化器将选择一个索引.这就是为什么它没有同时选择边缘索引跳过列表索引的原因.

For filter conditions combined with logical ANDs as in your query, ArangoDB's query optimizer will pick a single index. This is the reason why it hasn't picked the edge index and the skiplist index at the same time.

它将在expired上的跳过列表索引和[ "_from", "_to" ]上的边缘索引之间进行选择,并选择一个确定较低成本的索引,该成本由索引选择性估计来衡量.如说明输出所示,似乎已经选择了_to上的边索引.

It will do a selection between the skiplist index on expired and the edge index on [ "_from", "_to" ], and will pick the one for which it determines the lower cost, which is measured by index selectivity estimates. As the explain output shows, it seems to have picked the edge index on _to.

边缘索引在内部由两个单独的哈希索引组成,一个在_from属性上,一个在_to属性上,因此它允许通过_from_to属性进行快速访问.但是,它不是[ "_from", "_to" ]上的组合索引,因此不是不是,因此它不支持同时要求_from_to的查询.它必须选择一个内部哈希索引,并且似乎已在该查询中的_to上选择了一个.该决定再次基于平均指数选择性.

The edge index internally consists of two separate hash indexes, one on the _from attribute and one on the _to attribute, so it allows quick access via both the _from and the _to attributes. However, it's not a combined index on [ "_from", "_to" ], so it does not support queries that ask for _from and _to at the same time. It has to pick one of the internal hash indexes, and seems to have picked the one on _to in that query. The decision is based on average index selectivity again.

无法向优化器提供任何索引使用提示-除此之外,对于该特定查询,它无法同时使用两个索引.

There is no way to provide any index usage hint to the optimizer - apart from that, it wouldn't be able to use two indexes at the same time for this particular query.

查看说明输出中的选择性估计值,似乎边缘索引的选择性不是很高,这意味着会有许多具有相同_to值的边缘.由于优化程序还应该考虑_from上的索引,因此我认为索引的选择性更低,而且每个索引最多只能帮助跳过最多50%的边缘,这并不是很多.如果确实如此,那么查询仍将检索(并后过滤)很多文档,从而说明潜在的速度缓慢.

Looking at the selectivity estimate in the explain output, it seems that the edge index is not very selective, meaning there'll be lots of edges with the same _to values. As the optimizer should have also taken into account the index on _from, I would assume that index is even less selective, and that each of these indexes will only help to skip at most 50 % of the edges, which is not very much. If that's actually the case, then the query will still retrieve (and post-filter) a lot of documents, explaining potential slowness.

目前,属性_from_to已自动在边缘集合的始终存在的边缘索引中建立索引,并且不能在其他用户定义的索引中使用它们. 我们希望在将来的版本中添加此功能,因为对于用户定义的索引,可以访问_from_to,因此可以在[ "_from", "_to", "expired" ]上创建组合(排序)索引,这可能会比三个单一属性索引中的任何一个具有更高的选择性.

At the moment the attributes _from and _to are automatically indexed in an edge collection's always-present edge index, and they cannot be used in additional, user-defined indexes. This is a feature that we would like to add in a future release, because with _from and _to being accessible for user-defined indexes, one could create a combined (sorted) index on [ "_from", "_to", "expired" ] which would be potentially much more selective than any of the three single-attribute indexes in isolation.

这篇关于ArangoDB索引与边缘集合的使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆