ArangoDB:数组元素中的性能指标 [英] ArangoDB: performance index in array element
问题描述
我在 ArangoDB 中有一个集合,其中填充了这样的元素:
I have a Collection in ArangoDB populated with element like this:
{
"id": "XXXXXXXX",
"relation": [
{
"AAAAA": "AAAAA",
},
{
"BBBB": "BBBBBB",
"field": {
"v1": 0,
"v2": 0,
"v3": 0
}
},
{
"CCCC": "CCCC",
"field": {
"v1": 0,
"v2": 1,
"v3": 2
}
},
]
}
我只想返回具有 field.v1 > 的元素0
(或 v 值的组合).我试过写一个这样的 AQL 查询,但它不使用索引,而且它有 200000+ 个元素,速度很慢.
I want to return only elements that have field.v1 > 0
(or a combination of v values).
I've tried to write an AQL query like this one, but it doesn't use indexes and it is so slow with 200000+ elements.
FOR a in X
FILTER LENGTH(a.relation) > 0
LET relation = a.relation
FOR r in relation
FILTER r.field > null
FILTER r.field.v1 > 0
return a
我尝试创建这些索引:
关系[*]字段的全文
跳过关系[*]字段的列表
关系[*]字段上的哈希
但没有结果.
full text on relation[*]field
skip list on relation[*]field
hash on relation[*]field
but with no result.
我能做什么?你能建议我对查询进行任何更改吗?
What can I do? Can you suggest me any changes to the query?
谢谢.
最好的问候,
丹尼尔
推荐答案
我建议进行以下更改,但它们不会显着加快查询速度:
I suggest the following changes, but they won't speed up the query noticeably:
过滤器
FILTER r.field >null
和FILTER r.field.v1 >0
是多余的.你可以只使用后者FILTER r.field.v1 >0
并省略其他过滤条件
the filters
FILTER r.field > null
andFILTER r.field.v1 > 0
are redundant. You can just use the latterFILTER r.field.v1 > 0
and omit the other filter condition
辅助变量LETrelation = a.relation
是在LENGTH(a.relation)
使用a.relation
之后定义的代码>计算.如果辅助变量将在 LENGTH()
计算之前定义,它可以像这样在它内部使用:LETrelation = a.relation FILTER LENGTH(relation) >0
.这将节省一些处理时间
the auxiliary variable LET relation = a.relation
is defined after a.relation
is used in the LENGTH(a.relation)
calculation. If the auxiliary variable would be defined before the LENGTH()
calculation, it could be used inside it like this: LET relation = a.relation FILTER LENGTH(relation) > 0
. This will save a bit of processing time
原始查询检查每个 v1
值,如果文档中的多个 v1
值满足过滤条件,则可能会多次返回每个文档.这意味着原始查询可能返回比集合中实际存在的文档更多的文档.如果不需要,我建议使用子查询(见下文)
the original query checks each v1
value and may return each document multiple times if multiple v1
values in a document satisfy the filter condition. That means the original query may return more documents than there are actually present in the collection. If that's not desired, I suggest using a subquery (see below)
将上述修改应用到原始查询时,这是我想出的:
When applying the above modifications to the original query, this is what I came up with:
FOR a IN X
LET relation = a.relation
FILTER LENGTH(relation) > 0
LET s = (
FOR r IN relation
FILTER r.field.v1 > 0
LIMIT 1
RETURN 1
)
FILTER LENGTH(s) > 0
RETURN a
正如我所说,这可能不会大大提高性能,但是,您可能会从查询中获得不同的(可能是所需的)结果,即如果文档中的多个 v1
满足过滤条件.
As I said this probably won't improve performance greatly, however, you may get a different (potentially the desired) result from the query, i.e. less documents if multiple v1
in a document satisfy the filter condition.
关于索引:全文和哈希索引在这里没有帮助,因为它们只支持相等比较,但查询的过滤条件大于.一般来说,唯一在这里有益的索引类型是跳过列表索引.但是,2.7 中根本不支持索引数组值,因此索引 relation[*].field
将无济于事,并且仍然不会像您报告的那样使用索引.
Regarding indexes: fulltext and hash indexes will not help here as they support only equality comparisons, but the query's filter conditions is a greater than. The only index type that could be beneficial here in general would be the skiplist index. However, indexing array values is not supported in 2.7 at all, so indexing relation[*].field
won't help and still no index will be used as you reported.
ArangoDB 2.8 将是第一个支持索引单个数组值的版本,在那里你可以在 relation[*].field.v1
上创建一个索引.
ArangoDB 2.8 will be the first version that supports indexing individual array values, and there you could create an index on relation[*].field.v1
.
2.8 中的查询仍然不会使用该索引,因为数组索引仅用于 IN
比较运算符.它们不能与查询中的 >
一起使用.此外,当将过滤条件写为 FILTER r[*].field.v1 >0
,这将评估为 FILTER [null, 0, 0] >0
对于上面的示例文档,不会产生预期的结果.
Still the query in 2.8 won't use that index because the array indexes are only used for the IN
comparison operator. They cannot be used with a >
as in the query. Additionally, when writing the filter condition as FILTER r[*].field.v1 > 0
, this would evaluate to FILTER [null, 0, 0] > 0
for the example document above, which will not produce the desired results.
在这里可以提供帮助的是一个比较运算符修饰符(工作标题),它可以告诉运算符 <
, <=
, >
, >=
, ==
, !=
对其左操作数的所有成员运行比较.可以有 ALL
和 ANY
修改,因此过滤条件可以简单地写成 FILTER a.relation[*].field.v1 ANY >0
.但请注意,这还不是现有功能,只是我关于将来如何修复此问题的快速草稿.
What could help here is a comparison operator modificator (working title) that could tell the operators <
, <=
, >
, >=
, ==
, !=
to run the comparison on all members of its left operand. There could be ALL
and ANY
modifications, so that the filter condition could be written as simply FILTER a.relation[*].field.v1 ANY > 0
. But please note that this is not an existing feature yet, but only my quick draft for how this could be fixed in the future.
这篇关于ArangoDB:数组元素中的性能指标的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!