弹性搜索用数组中的字符串聚合 [英] Elasticsearch terms aggregation by strings in an array

查看:216
本文介绍了弹性搜索用数组中的字符串聚合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何编写一个弹性搜索术语汇总,将分段划分为整个术语,而不是单独的令牌?例如,我想按国家汇总,但以下情况将新的,纽约州,泽西岛和加利福尼亚州作为个别水桶返回,而不是纽约州,新泽西州和加利福尼亚州作为预期的水桶:

  curl -XPOSThttp:// localhost:9200 / my_index / _search-d'
{
aggs:{
state:{
terms:{
field:states,
size:10
}
}

}'

我的用例就像这里描述的那样
< a href =https://www.elastic.co/guide/en/elasticsearch/guide/current/aggregations-and-analysis.html =nofollow> https://www.elastic.co/guide/ en / elasticsearch / guide / current / aggregations-and-analysis.html
只有一个区别:
我的案例中的城市字段是一个数组。



示例对象:

  {
states:[New York,新泽西州,加利福尼亚州]
}

似乎提出的解决方案(将字段映射为not_analyzed)对于数组不起作用。



我的映射:

  {
properties:{
state:{
type:object,
fields:{
raw:{
type:object,
index:not_analyzed
}
}
}
}
}

我已经尝试用string替换object,但这也不起作用。

解决方案

我想你所缺少的是states.raw(请注意,由于没有指定分析器,所以states字段用标准分析器;子字段rawnot_analyzed)。虽然你的映射也可能看起来很好。当我尝试使用ES 2.0映射时,我遇到一些错误,但是这样做有效:

  PUT / test_index 
{
mappings:{
doc:{
properties:{
states:{
type:string,
fields:{
raw:{
type:string,
index:not_analyzed
}
}
}
}
}
}
}

然后我添加了几个文档:

  POST / test_index / doc / _bulk 
{index {_id:1}}
{states:[New York,New Jersey,California]}​​
{index:{_ id:2}}
{states:[纽约,北卡罗来纳州,北达科他州)}

这个查询似乎做了你想要的:

  POST / test_index / _search 
{
size:0,
aggs:{
state:{
terms:{
field:states.raw,
size:10
}
}
}
}

返回:



$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $$$$$$$ :1,
success:1,
failed:0
},
hits:{
total:2,
max_score:0,
hits:[]
},
聚合:{
states:{
doc_count_error_upper_bound ,
sum_other_doc_count:0,
buckets:[
{
key:New York,
doc_count:2
},
{
key:California,
doc_count:1
},
{
key泽西岛,
doc_count:1
$,
{
key:北卡罗来纳州,
doc_count:1
},
{
key北达科他州,
doc_count:1
}
]
}
}
}
/ pre>

这是我用来测试的代码:



http://sense.qbox.io/gist/31851c3cfee8c1896eb4b53bc1ddd39ae87b173e


How can I write an Elasticsearch terms aggregation that splits the buckets by the entire term rather than individual tokens? For example, I would like to aggregate by state, but the following returns new, york, jersey and california as individual buckets, not New York and New Jersey and California as the buckets as expected:

curl -XPOST "http://localhost:9200/my_index/_search" -d'
{
    "aggs" : {
        "states" : {
            "terms" : { 
                "field" : "states",
                "size": 10
            }
        }
    }
}'

My use case is like the one described here https://www.elastic.co/guide/en/elasticsearch/guide/current/aggregations-and-analysis.html with just one difference: the city field is an array in my case.

Example object:

{
    "states": ["New York", "New Jersey", "California"]
}

It seems that the proposed solution (mapping the field as not_analyzed) does not work for arrays.

My mapping:

{
    "properties": {
        "states": {
            "type":"object",
            "fields": {
                "raw": {
                    "type":"object",
                    "index":"not_analyzed"
                }
            }
        }
    }
}

I have tried to replace "object" by "string" but this is not working either.

解决方案

I think all you're missing is "states.raw" in your aggregation (note that, since no analyzer is specified, the "states" field is analyzed with the standard analyzer; the sub-field "raw" is "not_analyzed"). Though your mapping might bear looking at as well. When I tried your mapping against ES 2.0 I got some errors, but this worked:

PUT /test_index
{
   "mappings": {
      "doc": {
         "properties": {
            "states": {
               "type": "string",
               "fields": {
                  "raw": {
                     "type": "string",
                     "index": "not_analyzed"
                  }
               }
            }
         }
      }
   }
}

Then I added a couple of docs:

POST /test_index/doc/_bulk
{"index":{"_id":1}}
{"states":["New York","New Jersey","California"]}
{"index":{"_id":2}}
{"states":["New York","North Carolina","North Dakota"]}

And this query seems to do what you want:

POST /test_index/_search
{
    "size": 0, 
    "aggs" : {
        "states" : {
            "terms" : { 
                "field" : "states.raw",
                "size": 10
            }
        }
    }
}

returning:

{
   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 2,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "states": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": [
            {
               "key": "New York",
               "doc_count": 2
            },
            {
               "key": "California",
               "doc_count": 1
            },
            {
               "key": "New Jersey",
               "doc_count": 1
            },
            {
               "key": "North Carolina",
               "doc_count": 1
            },
            {
               "key": "North Dakota",
               "doc_count": 1
            }
         ]
      }
   }
}

Here's the code I used to test it:

http://sense.qbox.io/gist/31851c3cfee8c1896eb4b53bc1ddd39ae87b173e

这篇关于弹性搜索用数组中的字符串聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆