从每个唯一的存储桶中查找前5个文档 [英] Find top 5 documents from each unique bucket
问题描述
让我们说,我有许多ElasticSearch文档,例如以下示例:
Let us say that I've a number of ElasticSearch documents like the sample given below:
{
"Tagname": [
"Veniam"
],
"Title": [
"Occaecat do. Eu ut."
]
},
...
...
...
{
"Tagname": [
"Anim"
],
"Title": [
"Consectetur dolor consectetur eu."
]
},
...
...
...
{
"Tagname": [
"Aliqua"
],
"Title": [
"Culpa in ut ut. Enim in excepteur eiusmod."
]
}
...
...
...
在此,假设 Tagname
是每个 Title
所属的标签的名称.并且 Tagname
被映射为 keyword
,这样,当我在 Tagname.keyword
上进行汇总时,我得到了3个唯一的 TagName
桶(Veniam,Anim,Aliqua等).就我而言,假设我们没有固定数量的唯一 TagName
,并且它可能会动态变化.因此,我们不能在搜索查询中假设一个唯一的 TagName
的静态列表.
Here, lets assume that Tagname
is the name of the tag under which every Title
falls. And Tagname
is mapped as a keyword
, such that when I aggregate on the Tagname.keyword
, I get for example 3 unique TagName
buckets (Veniam, Anim, Aliqua, etc...). In my case lets assume we do not have a fixed number of unique TagName
and it might change dynamically. So, we can not assume a static list of unique TagName
in our search query.
我现在想要实现的是在每个存储桶下获得前5个
What I want to achieve now is to get top 5 Title
values under each of these buckets. (So far a sorting or ordering of any sort to get the top 5 is not essential, and random 5 would also work. However an explanation for the sorting would be enlighting.)
推荐答案
我建议使用以下聚合.我使用了100的任意大小,但是您可以用 Tagname
字段的基数替换该大小,以确保每个 Tagname
.然后,您有一个嵌套的 top_hits
聚合,它将为每个存储桶返回5个文档.
I suggest using the following aggregation. I've used an arbitrary size of 100, but you can replace that by the cardinality of your Tagname
field, so as to make sure that you get one bucket per value of Tagname
. Then, you have a nested top_hits
aggregation that will return you 5 documents for each bucket.
{
"size": 0,
"aggs": {
"tags": {
"terms": {
"field": "Tagname.keyword",
"size": 100
},
"aggs": {
"latest": {
"top_hits": {
"size": 5
}
}
}
}
}
}
这篇关于从每个唯一的存储桶中查找前5个文档的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!