寻呼弹性聚合结果 [英] Paging elasticsearch aggregation results

查看:78
本文介绍了寻呼弹性聚合结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

想象一下,我有两种记录:一个桶和一个物品,其中物品包含在桶中,桶可能具有相对少量的物品(通常不超过4,不超过10个)。这些记录被压缩成一个(具有额外桶信息的项目),并放置在Elasticsearch内。
我想要解决的任务是通过过滤的查询一次性查找500个桶(最大值),所有相关项目依赖于项目的属性,并且我被困在限制/抵消聚合上。我如何执行这样的任务?我看到 top_hits 聚合,允许我控制相关项目的大小,但是我找不到一个线索,我如何控制返回的桶的大小。

Imagine i have two kind of records: a bucket and an item, where item is contained in a bucket, and bucket may have relatively small amount of items (normally not more than 4, never more than 10). Those records are squashed into one (an item with extra bucket information) and placed inside Elasticsearch. The task i am trying to solve is to find 500 buckets (at max) with all related items at once by filtered query that relies on item's attributes, and i'm stuck on limiting / offsetting aggregations. How do i perform such kind of task? I see top_hits aggregation which allows me to control size of related items amount, but i can't find a clue how can i control size of returned buckets.

更新:好的,我真的很愚蠢。 大小参数术语聚合为我提供了限制。有没有办法执行偏移任务?我不需要100%的精度,可能不会页面这些结果,但无论如何我想看到这个功能。

update: okay, i'm terribly stupid. The size parameter of terms aggregation provides me with limiting. Is there any way to perform offset task? I don't need 100% precision and probably won't ever page those results, but anyway i'd like to see this functionality.

推荐答案

我不认为我们会很快看到此功能,请参阅 GitHub

I don't think we'll be seeing this feature any time soon, see relevant discussion at GitHub.


分页是非常棘手的实现,因为文档计数条件
聚合不完全shard_size是小于字段
基数和排序计数desc。所以奇怪的事情可能会像
一样,第二页的第一个字段比第一个页面的最后一个
元素的数量更高。

Paging is tricky to implement because document counts for terms aggregations are not exact when shard_size is less than the field cardinality and sorting on count desc. So weird things may happen like the first term of the 2nd page having a higher count than the last element of the first page, etc.

提到一个有趣的方法,您可以在第一页上请求前20个结果,然后在第2页上运行相同的聚合,但不包括您在上一页上已经看到的20个术语向前。但是这不允许你随机访问任意页面,你必须按顺序浏览页面。

There an interesting approach is mentioned, you could request like top 20 results on 1st page, then on 2nd page you run the same aggregation but exclude those 20 terms you already saw on the previous page and so forth. But this doesn't allow you "random" access to arbitrary page, you must go through pages in-order.


... if与
匹配的文档数相比,您只能拥有有限数量的唯一值,在客户端进行分页将更有效率为
。另一方面,在高基数领域,您的
第一种方法基于排除可能会更好。

...if you only have a limited number of unique values compared to the number of matched documents, doing the paging on client-side would be more efficient. On the other hand, on high-cardinality-fields, your first approach based on an exclude would probably be better.

这篇关于寻呼弹性聚合结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆