弹性搜索:聚合,按字段计数 [英] Elastic Search: aggregation, count by field

查看:68
本文介绍了弹性搜索:聚合,按字段计数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将此数据插入了弹性搜索:

I inserted this data into elastic search:

[
  { "name": "Cassandra Irwin",  "location": "Monzon de Campos" ..     },
  { "name": "Gayle Mooney",     "location": "Villarroya del Campo" .. },
  { "name": "Angelita Charles", "location": "Revenga de Campos" ..    }, 
  { "name": "Sheppard Sweet",   "location": "Santiago del Campo" ..   },
  ..
  ..

侧注:重现: 1)下载: http://wmo.co/20160928_es_query/bulk.json 2)执行:curl -s -XPOST' http://localhost:9200/testing/external/_bulk ?pretty '--data-binary @ bulk.json

Sidenote: to reproduce: 1) download: http://wmo.co/20160928_es_query/bulk.json 2) execute: curl -s -XPOST 'http://localhost:9200/testing/external/_bulk?pretty' --data-binary @bulk.json

问题:获得每个位置"有多少条记录的计数.

Question: obtain a count of how many records there are per "location".

解决方案1:存储桶聚合..无法获得理想的结果

curl -s -XPOST 'localhost:9200/testing/_search?pretty' -d '
{
  "aggs": {  "location_count": { "terms": { "field":"location",   "size":100 }}}
}' | jq  '.aggregations'

结果:

{"location_count":{"doc_count_error_upper_bound":0,"sum_other_doc_count":0,
 "buckets":[
    {"key":"campo",     "doc_count":47},
    {"key":"del",       "doc_count":47},
    {"key":"campos",    "doc_count":29},
    {"key":"de",        "doc_count":29},
    {"key":"villarroya","doc_count":15},
    {"key":"torre",     "doc_count":12},
    {"key":"monzon",    "doc_count":11},
    {"key":"santiago",  "doc_count":11},
    {"key":"pina",      "doc_count":9},
    {"key":"revenga",   "doc_count":9},
    {"key":"uleila",    "doc_count":9}
]}}

问题:它将位置"字段拆分为单词,并返回每个单词的文档计数.

Problem: it splits the 'location' fields into words, and returns a doc count per word.

解决方案2:期望的结果,但性能令人担忧.

我可以使用此查询来做到这一点,提取所有位置并在jq(每个方便的JSON cli工具)中进行汇总, 但是当将其应用于大量数据时,这可能会成为性能噩梦:

I can do it using this query, pulling out ALL locations and doing the aggregation in jq (the every handy JSON cli-tool), but this can turn into a performance nightmare when applied to huge volumes of data :

curl -s -XPOST 'localhost:9200/testing/_search?pretty' -d '
 {
   "query": { "wildcard": { "location": "*" } }, "size":1000,
   "_source": ["location"]
 }' | jq  '[.hits.hits[] |
           {location:._source.location,"count":1}] |
           group_by(.location) |
           map({ key: .[0].location, value: map(.count)|add })'

结果:

[
  { "key": "Monzon de Campos",      "value": 11 },
  { "key": "Pina de Campos",        "value": 9  },
  { "key": "Revenga de Campos",     "value": 9  },
  { "key": "Santiago del Campo",    "value": 11 },
  { "key": "Torre del Campo",       "value": 12 },
  { "key": "Uleila del Campo",      "value": 9  },
  { "key": "Villarroya del Campo",  "value": 15 }
]

这是我想要的确切结果.

This is the exact result that I want.

问题:如何通过弹性搜索查询获得相同的结果? (即通过弹性搜索而不是jq处理聚合)

QUESTION: how can I obtain the same results via elastic search query? (ie. with the aggregation handled by elastic search, and not by jq)

推荐答案

您需要在location字段中添加not_analyzed子字段.

You need to add a not_analyzed sub-field to your location field.

首先像这样修改您的映射:

First modify your mapping like this:

curl -XPOST 'http://localhost:9200/testing/_mapping/external' -d '{
   "properties": {
      "location": {
         "type": "string",
         "fields": {
            "raw": {
               "type": "string",
               "index": "not_analyzed"
            }
         }
      }
   }
}'

然后再次为您的数据重新编制索引:

Then reindex your data again:

curl -s -XPOST 'http://localhost:9200/testing/external/_bulk?pretty' --data-binary @bulk.json

最后,您将能够像这样(在location.raw字段上)运行查询并获得您期望的结果:

Finally, you'll be able to run your query like this (on the location.raw field) and get the results you expect:

curl -s -XPOST 'localhost:9200/testing/_search?pretty' -d '
{
  "aggs": {  "location_count": { "terms": { "field":"location.raw",   "size":100 }}}
}' | jq  '.aggregations'

这篇关于弹性搜索:聚合,按字段计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆