将 ElasticSearch 聚合限制为前 n 个查询结果 [英] Limit ElasticSearch aggregation to top n query results

查看:49
本文介绍了将 ElasticSearch 聚合限制为前 n 个查询结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一组 280 万个文档,其中包含我正在使用 ElasticSearch 查询的标签集,但其中许多文档可以通过一个 ID 组合在一起.我想使用标签查询我的数据,然后通过重复的 ID 聚合它们.通常我的搜索结果有数万个文档,但我只想聚合搜索的前 100 个结果.如何将聚合限制为仅来自查询的前 100 个结果?

I have a set of 2.8 million docs with sets of tags that I'm querying with ElasticSearch, but many of these docs can be grouped together by one ID. I want to query my data using the tags, and then aggregate them by the ID that repeats. Often my search results have tens of thousands of documents, but I only want to aggregate the top 100 results of the search. How can I constrain an aggregation to only the top 100 results from a query?

推荐答案

采样聚合 :

用于限制任何子聚合处理的过滤聚合到得分最高的文档样本.

A filtering aggregation used to limit any sub aggregations' processing to a sample of the top-scoring documents.

"aggs": {
     "bestDocs": {
         "sampler": {
          //    "field": "<FIELD>", <-- optional, Controls diversity using a field
              "shard_size":100
         },
         "aggs": {
              "bestBuckets": {
                 "terms": {
                      "field": "id"
                  }
               }
         }
      }
  }

此查询会将子聚合限制为结果中的前 100 个文档,然后按 ID 对它们进行存储.

This query will limit the sub aggregation to top 100 docs from the result and then bucket them by ID.

或者,您可以使用 field 或 script 和 max_docs_per_value 设置来控制在共享公共值的任何一个分片上收集的最大文档数.

Optionally, you can use the field or script and max_docs_per_value settings to control the maximum number of documents collected on any one shard which share a common value.

这篇关于将 ElasticSearch 聚合限制为前 n 个查询结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆