将ElasticSearch聚合限制为前n个查询结果 [英] Limit ElasticSearch aggregation to top n query results
问题描述
我有一套280万个文档,我使用ElasticSearch查询的标签集,但是这些文档中的许多可以通过一个ID分组在一起。我想使用标签查询我的数据,然后通过重复的ID聚合它们。通常我的搜索结果有成千上万的文档,但我只想聚合搜索的前100个结果。如何将聚合约束到查询的前100个结果?
I have a set of 2.8 million docs with sets of tags that I'm querying with ElasticSearch, but many of these docs can be grouped together by one ID. I want to query my data using the tags, and then aggregate them by the ID that repeats. Often my search results have tens of thousands of documents, but I only want to aggregate the top 100 results of the search. How can I constrain an aggregation to only the top 100 results from a query?
推荐答案
过滤聚合用于限制任何子聚合处理
到最高评分文档的样本。
A filtering aggregation used to limit any sub aggregations' processing to a sample of the top-scoring documents.
"aggs": {
"bestDocs": {
"sampler": {
// "field": "<FIELD>", <-- optional, Controls diversity using a field
"shard_size":100
},
"aggs": {
"bestBuckets": {
"terms": {
"field": "id"
}
}
}
}
}
此查询将限制子公司从结果中获取前100个文档,然后按ID存储。
This query will limit the sub aggregation to top 100 docs from the result and then bucket them by ID.
(可选)您可以使用字段或脚本和max_docs_per_value
设置来控制在任何一个分享共同价值的分片上收集的最大文档数量。
Optionally, you can use the field or script and max_docs_per_value
settings to control the maximum number of documents collected on any one shard which share a common value.
这篇关于将ElasticSearch聚合限制为前n个查询结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!