ElasticSearch按文档字段分组并计数发生次数 [英] ElasticSearch group by documents field and count occurences
问题描述
我的ElasticSearch 6.5.2索引看起来像:
My ElasticSearch 6.5.2 index look likes:
{
"_index" : "searches",
"_type" : "searches",
"_id" : "cCYuHW4BvwH6Y3jL87ul",
"_score" : 1.0,
"_source" : {
"querySearched" : "telecom",
}
},
{
"_index" : "searches",
"_type" : "searches",
"_id" : "cSYuHW4BvwH6Y3jL_Lvt",
"_score" : 1.0,
"_source" : {
"querySearched" : "telecom",
}
},
{
"_index" : "searches",
"_type" : "searches",
"_id" : "eCb6O24BvwH6Y3jLP7tM",
"_score" : 1.0,
"_source" : {
"querySearched" : "industry",
}
我想要一个返回此结果的查询:
And I would like a query that return this result:
"result":
{
"querySearched" : "telecom",
"number" : 2
},
{
"querySearched" : "industry",
"number" : 1
}
我只想按发生次数分组并获取每个事件的数量,最多只能有十个数字.我尝试使用聚合,但存储桶为空. 谢谢!
I just want to group by occurence and get number of each, limit to ten biggest numbers. I tried with aggregations but bucket is empty. Thanks!
推荐答案
设置您的映射
PUT /index
{
"mappings": {
"doc": {
"properties": {
"querySearched": {
"type": "text",
"fielddata": true
}
}
}
}
}
您的查询应类似于
GET index/_search
{
"size": 0,
"aggs": {
"result": {
"terms": {
"field": "querySearched",
"size": 10
}
}
}
}
You should add fielddata:true
in order to enable aggregation for text type field more of that
"size": 10, => limit to 10
与@Kamal简短讨论后,我有义务告诉您,如果您选择启用fielddata:true
,则必须知道
它会消耗很多堆空间.
After a short discussion with @Kamal i feel obligated to let you know that if you choose to enable fielddata:true
you must know that
it can consume a lot of heap space.
通过我分享的链接:
Fielddata会占用大量堆空间,尤其是在加载高基数的文本字段时.一旦将字段数据加载到堆中,它在该段的生命周期内将一直保留在堆中.同样,加载字段数据是一个昂贵的过程,可能导致用户遇到延迟问题.这就是默认情况下禁用字段数据的原因.
Fielddata can consume a lot of heap space, especially when loading high cardinality text fields. Once fielddata has been loaded into the heap, it remains there for the lifetime of the segment. Also, loading fielddata is an expensive process which can cause users to experience latency hits. This is why fielddata is disabled by default.
另一种选择(一种更有效的选择):
Another alternative (a more efficient one):
PUT /index
{
"mappings": {
"doc": {
"properties": {
"querySearched": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
然后进行汇总查询
GET index/_search
{
"size": 0,
"aggs": {
"result": {
"terms": {
"field": "querySearched.keyword",
"size": 10
}
}
}
}
两种解决方案都可以,但是您应该使用此正在考虑中.
Both solutions works but you should take this under consideration.
希望有帮助
这篇关于ElasticSearch按文档字段分组并计数发生次数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!