ArangoDB方面的搜索性能 [英] ArangoDB Faceted Search Performance

查看：604 发布时间：2020/6/2 20:53:39 aggregation arangodb facet faceted-search

本文介绍了ArangoDB方面的搜索性能的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我们正在评估方面计算方面的ArangoDB性能。
还有许多其他产品可以通过特殊的API或查询语言执行相同的操作：

We are evaluating ArangoDB performance in space of facets calculations. There are number of other products capable of doing the same, either via special API or query language:

MarkLogic Facets

ElasticSearch集合

Solr Faceting等

我们了解，Arango中没有用于显式计算事实的特殊API。
但是实际上并不需要，感谢全面的AQL，可以通过简单的查询轻松实现，例如：

We understand, there is no special API in Arango to calculate factes explicitly. But in reality, it is not needed, thanks for a comprehensive AQL it can be easily achieved via simple query, like:

 FOR a in Asset 
  COLLECT attr = a.attribute1 INTO g
 RETURN { value: attr, count: length(g) }

此查询计算attribute1上的构面，并产生以下形式的频率：

This query calculate a facet on attribute1 and yields frequency in the form of:

[
  {
    "value": "test-attr1-1",
    "count": 2000000
  },
  {
    "value": "test-attr1-2",
    "count": 2000000
  },
  {
    "value": "test-attr1-3",
    "count": 3000000
  }
]

在我的整个集合中，attribute1采取三种形式（test-attr1-1，test-attr1-2和test-attr1-3），并提供了相关计数。
我们几乎运行了一个DISTINCT查询并汇总了计数。

It is saying, that across my entire collection attribute1 took three forms (test-attr1-1, test-attr1-2 and test-attr1-3) with related counts provided. Pretty much we run a DISTINCT query and aggregated counts.

看起来简单干净。

上面提供的查询运行了31秒！仅包含8M个文档的测试集。
我们已经尝试了不同的索引类型，使用了存储引擎（使用rocksdb和不使用rocksdb），无济于事地研究了说明计划。
我们在此测试中使用的测试文档非常简洁，只有三个简短属性。

Provided query above runs for !31 seconds! on top of the test collection with only 8M documents. We have experimented with different index types, storage engines (with rocksdb and without), investigating explanation plans at no avail. Test documents we use in this test are very concise with only three short attributes.

在此，我们将不胜感激。
我们做错了什么。或ArangoDB根本不是设计要在此特定区域执行的。

We would appreciate any input at this point. Either we doing something wrong. Or ArangoDB simply is not designed to perform in this particular area.

btw，最终目标是在不到一秒的时间内运行以下内容：

btw, ultimate goal would be to run something like the following in under-second time:

LET docs = (FOR a IN Asset 

  FILTER a.name like 'test-asset-%'

  SORT a.name

 RETURN a)

LET attribute1 = (

 FOR a in docs 

  COLLECT attr = a.attribute1 INTO g

 RETURN { value: attr, count: length(g[*])}

)

LET attribute2 = (

 FOR a in docs 

  COLLECT attr = a.attribute2 INTO g

 RETURN { value: attr, count: length(g[*])}

)

LET attribute3 = (

 FOR a in docs 

  COLLECT attr = a.attribute3 INTO g

 RETURN { value: attr, count: length(g[*])}

)

LET attribute4 = (

 FOR a in docs 

  COLLECT attr = a.attribute4 INTO g

 RETURN { value: attr, count: length(g[*])}

)

RETURN {

  counts: (RETURN {

    total: LENGTH(docs), 

    offset: 2, 

    to: 4, 

    facets: {

      attribute1: {

        from: 0, 

        to: 5,

        total: LENGTH(attribute1)

      },

      attribute2: {

        from: 5, 

        to: 10,

        total: LENGTH(attribute2)

      },

      attribute3: {

        from: 0, 

        to: 1000,

        total: LENGTH(attribute3)

      },

      attribute4: {

        from: 0, 

        to: 1000,

        total: LENGTH(attribute4)

      }

    }

  }),

  items: (FOR a IN docs LIMIT 2, 4 RETURN {id: a._id, name: a.name}),

  facets: {

    attribute1: (FOR a in attribute1 SORT a.count LIMIT 0, 5 return a),

    attribute2: (FOR a in attribute2 SORT a.value LIMIT 5, 10 return a),

    attribute3: (FOR a in attribute3 LIMIT 0, 1000 return a),

    attribute4: (FOR a in attribute4 SORT a.count, a.value LIMIT 0, 1000 return a)

   }

}

谢谢！

ArangoDB方面的搜索性能 [英] ArangoDB Faceted Search Performance

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

ArangoDB方面的搜索性能 [英] ArangoDB Faceted Search Performance

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭