ArangoDB:数组元素中的性能索引 [英] ArangoDB: performance index in array element

查看:1078
本文介绍了ArangoDB:数组元素中的性能索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在ArangoDB中有一个使用以下元素填充的集合:

  {

id :XXXXXXXX,
关系:[
{
AAAAA:AAAAA,
},
{
BBBB :BBBBBB,
字段:{
v1:0,
v2:0,
v3:0
}
},
{
CCCC:CCCC,
字段:{
v1:0,
v2:1,
v3:2
}
},
]
}

我想只返回 field.v1>元素。 0 (或v值的组合)。
我曾尝试编写像这样的AQL查询,但它不使用索引,而且它的速度很慢,只有200000多个元素。

  FOR X 
FILTER LENGTH(a.relation)> 0
LET relation = a.relation
FOR r in
FILTER r.field> null
FILTER r.field.v1> 0
返回

我试图创建这些索引:




  • 关系[*]字段的全文

  • 关于[*]字段的跳过列表

  • 关于[*]字段的哈希
    但没有结果。



我该怎么办?您能否建议我对查询进行任何更改?



谢谢。



祝你好运,



Daniele

解决方案

我建议进行以下更改,但它们不会加速显着查询:




  • 过滤器 FILTER r.field> null FILTER r.field.v1> 0 是多余的。您可以使用后者 FILTER r.field.v1> 0 并省略其他过滤条件


  • 辅助变量 LET relation = a.relation a.relation 用于 LENGTH(a.relation)计算后定义。如果辅助变量将在 LENGTH()计算之前定义,则可以在其中使用它,如下所示: LET relation = a.relation FILTER LENGTH (关系)> 0 。这将节省一些处理时间


  • 原始查询检查每个 v1 值并可能返回每个如果文档中的多个 v1 值满足过滤条件,则多次记录。这意味着原始查询可能会返回比集合中实际存在的文档更多的文档。如果不需要,我建议使用子查询(见下文)




将上述修改应用于原始查询时,这就是我提出的:

  FOR IN X 
LET关系= a.relation
FILTER LENGTH(关系)> 0
LET s =(
FOR r IN关系
FILTER r.field.v1> 0
LIMIT 1
返回1

滤波器长度> 0
返回

正如我所说,这可能不会大大提高性能,但是,您可以从查询中获得不同的(可能是期望的)结果,即如果文档中的多个 v1 满足过滤条件,则会减少文档。



关于索引:全文和哈希索引在这里没有帮助,因为它们仅支持相等比较,但查询的过滤条件是大于。这里唯一可能有益的索引类型是skiplist索引。但是,2.7中根本不支持索引数组值,因此索引 relation [*] .field 将无济于事,并且仍然不会像您报告的那样使用索引。 / p>

ArangoDB 2.8将是第一个支持索引单个数组值的版本,你可以在 relation [*]上创建一个索引.field.v1



2.8中的查询仍然不会使用该索引,因为数组索引仅用于 IN 比较运营商。它们不能与查询中的> 一起使用。此外,将过滤条件写为 FILTER r [*]。field.v1> 0 ,这将评估为 FILTER [null,0,0]> 0 对于上面的示例文档,它不会产生所需的结果。



这里有什么可以帮助的是比较运算符修饰符(工作标题),可以告诉运算符< < = > > = == != 在其左操作数的所有成员上运行比较。可能有 ALL ANY 修改,因此过滤条件可以简写为 FILTER a.relation [*] .field.v1 ANY> 0 。但是请注意,这不是现有功能,但只是我的快速草案,了解将来如何解决这个问题。


I have a Collection in ArangoDB populated with element like this:

{

  "id": "XXXXXXXX",
  "relation": [
    {
      "AAAAA": "AAAAA",
    },
    {
      "BBBB": "BBBBBB",
      "field": {
        "v1": 0,
        "v2": 0,
        "v3": 0
      }
    },
    {
      "CCCC": "CCCC",
      "field": {
        "v1": 0,
        "v2": 1,
        "v3": 2
      }
    },
  ]
}

I want to return only elements that have field.v1 > 0 (or a combination of v values). I've tried to write an AQL query like this one, but it doesn't use indexes and it is so slow with 200000+ elements.

FOR a in X
    FILTER LENGTH(a.relation) > 0
    LET relation =  a.relation
    FOR r in relation
        FILTER r.field > null 
        FILTER r.field.v1 > 0
return a

I've tried to create these indexes:

  • full text on relation[*]field
  • skip list on relation[*]field
  • hash on relation[*]field but with no result.

What can I do? Can you suggest me any changes to the query?

Thanks.

Best regards,

Daniele

解决方案

I suggest the following changes, but they won't speed up the query noticeably:

  • the filters FILTER r.field > null and FILTER r.field.v1 > 0 are redundant. You can just use the latter FILTER r.field.v1 > 0 and omit the other filter condition

  • the auxiliary variable LET relation = a.relation is defined after a.relation is used in the LENGTH(a.relation) calculation. If the auxiliary variable would be defined before the LENGTH() calculation, it could be used inside it like this: LET relation = a.relation FILTER LENGTH(relation) > 0. This will save a bit of processing time

  • the original query checks each v1 value and may return each document multiple times if multiple v1 values in a document satisfy the filter condition. That means the original query may return more documents than there are actually present in the collection. If that's not desired, I suggest using a subquery (see below)

When applying the above modifications to the original query, this is what I came up with:

FOR a IN X 
  LET relation = a.relation
  FILTER LENGTH(relation) > 0 
  LET s = (
    FOR r IN relation
      FILTER r.field.v1 > 0 
      LIMIT 1 
      RETURN 1
  )
  FILTER LENGTH(s) > 0 
  RETURN a

As I said this probably won't improve performance greatly, however, you may get a different (potentially the desired) result from the query, i.e. less documents if multiple v1 in a document satisfy the filter condition.

Regarding indexes: fulltext and hash indexes will not help here as they support only equality comparisons, but the query's filter conditions is a greater than. The only index type that could be beneficial here in general would be the skiplist index. However, indexing array values is not supported in 2.7 at all, so indexing relation[*].field won't help and still no index will be used as you reported.

ArangoDB 2.8 will be the first version that supports indexing individual array values, and there you could create an index on relation[*].field.v1.

Still the query in 2.8 won't use that index because the array indexes are only used for the IN comparison operator. They cannot be used with a > as in the query. Additionally, when writing the filter condition as FILTER r[*].field.v1 > 0, this would evaluate to FILTER [null, 0, 0] > 0 for the example document above, which will not produce the desired results.

What could help here is a comparison operator modificator (working title) that could tell the operators <, <=, >, >=, ==, != to run the comparison on all members of its left operand. There could be ALL and ANY modifications, so that the filter condition could be written as simply FILTER a.relation[*].field.v1 ANY > 0. But please note that this is not an existing feature yet, but only my quick draft for how this could be fixed in the future.

这篇关于ArangoDB:数组元素中的性能索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆