关于 ArangoDB 中多索引的使用 [英] On multiple index usage in ArangoDB
问题描述
具有以下结构的文档:
{
path: String,
enabled: Long,
disabled: null || Long,
// other fields...
}
我想通过路径前缀和一些数字和文档时间戳之间的数字关系的组合来查找文档:(伪代码)
I would like to look the documents up by a combination of path's prefix and numerical relation between some number and document's timestamps: (pseudocode)
SELECT e FROM entries
WHERE
e.path STARTS WITH "somePrefix"
AND e.enabled <= timestamp
AND (
e.disabled == null
OR timestamp < e.disabled
)
我将从哪种索引结构中受益最多(如果有的话)?我应该在 enabled
字段上有一个非稀疏跳过列表索引 + 在 disabled
上有一个稀疏索引,在 path
上有一个全文非稀疏索引吗?ArangoDB 是否能够为这些类型的查询使用多个索引?我阅读了关于索引使用的文档页面,但我仍然不清楚.
What index structure will I benefit from the most, if any? Should I have a non-sparse skiplist index on enabled
field + a sparse one on disabled
and a fulltext non-sparse one on the path
? Is ArangoDB capable of utilising multiple indexes for these types of queries? I read the doc page on index usage, but I'm still unclear.
推荐答案
如果过滤条件与logical or结合,并且索引满足,ArangoDB可以在同一个集合上使用多个索引> 或 分支条件.
ArangoDB can use multiple indexes on the same collection if the filter conditions are combined with logical or, and the indexes satisfy the or branches conditions.
就您的查询而言,您有三个条件与逻辑和组合在一起,后者包含一个或.
In the case of your query, you have three conditions combined with logical and, with the latter containing an or.
AQL 中没有 STARTS WITH
谓词,但您可以使用由前缀边界构造的范围查询:e.path >= @lower &&e.path <@upper
.对于 "somePrefix"
的搜索值,边界将转换为 @upper
为 "somePrefix",并且
@lowerbeing
"somePrefiy"`(最后一个字符加一的搜索值).
There is no STARTS WITH
predicate in AQL, but instead you could use a range query constructed with the prefix bounds: e.path >= @lower && e.path < @upper
. For a search value of "somePrefix"
, the bounds would translate to @upper
being "somePrefix", and
@lowerbeing
"somePrefiy"` (search value with last character increased by one).
在 path
上创建跳过列表索引将使查询使用该索引.
Creating a skiplist index on path
will make the query use that index.
包括enabled
的搜索条件,到目前为止的组合条件是e.path >= @lower &&e.path <@upper &&e.enabled <= @timestamp
.虽然可以在多个属性上创建跳过列表索引,但它不会在 path
和 enabled
上使用,而只能在 path
上使用.反转索引属性的顺序(即先enabled
,然后是path
)也无济于事,因为这样索引将用于enabled
仅,但不在 path
上.
Including the search condition for enabled
, the combined condition up to now is e.path >= @lower && e.path < @upper && e.enabled <= @timestamp
. Though a skiplist index can be created on multiple attributes, it won't be used here on both path
and enabled
, but only on path
. Reversing the order of index attributes (i.e. enabled
first, then path
) won't help either, because then the index would be used on enabled
only, but not on path
.
通常,skiplist 索引将用于条件中可以产生连续范围的部分.如果在等式比较中使用最左边的索引属性就是这种情况(例如 e.path == @path && e.enabled <= @timestamp
可以工作),但是如果它最左边的索引属性是非等式比较(例如 e.path >= @lower && e.path <= @upper
或 @e.enabled <;= @timestamp
),则它不会查看其进一步的索引属性,因为无论如何它都会产生不连续的范围.
In general, the skiplist index will be used for the parts of the condition that can produce a contiguous range. That is the case if the left-most index attributes are used in equality comparisons (e.g. e.path == @path && e.enabled <= @timestamp
would work), but if its left-most index attributes are non-equality comparisons (e.g. e.path >= @lower && e.path <= @upper
or @e.enabled <= @timestamp
), then it won't look at its further index attributes, because it would produce a non-contiguous range anyway.
还可以选择在 disabled
上创建跳过列表索引.这允许优化器在 e.enabled
e.enabled <= @timestamp && 部分使用该索引.(e.disabled == null || @timestamp
e.disabled == null ||@timestamp <e.disabled
,但看起来不是很有选择性.
There's also the option of creating a skiplist index on disabled
. That allows the optimizer to use that index on the part e.enabled <= @timestamp && (e.disabled == null || @timestamp < e.disabled)
. It can transform this to e.disabled == null || @timestamp < e.disabled
, which however does not look very selective.
总而言之:对于该特定查询似乎没有好的索引选择.如果您可以以某种方式将 STARTS WITH
更改为相等比较,那么您可以在 path
和 enabled
上创建组合跳过列表索引,它会可能是相当有选择性的.如果您的 STARTS WITH
前缀总是相同的大小,那么将前缀保存在一个额外的属性中可能是值得的,该属性可以被索引而不是原始值并使用相等比较进行查询:e.pathPrefix == @prefix &&e.enabled <= @timestamp
.这需要为每个文档保存和维护一个额外的前缀属性,但在启用更具选择性的索引时可能是值得的.
In summary: there does not seem to be a good index choice for that particular query. If you could somehow change the STARTS WITH
to an equality comparison, then you could create a combined skiplist index on path
and enabled
, and it would potentially be rather selective.
If your STARTS WITH
prefixes are always the same size, it might be worth to save the prefix in an extra attribute, that can be indexed instead of the original value and be queried using an equality comparison: e.pathPrefix == @prefix && e.enabled <= @timestamp
.
That requires saving and maintaining an extra prefix attribute per document, but maybe its worth it when enabling the use of a much more selective index.
这篇关于关于 ArangoDB 中多索引的使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!