ElasticSearch仅返回具有不同价值的文档 [英] ElasticSearch returning only documents with distinct value

查看:153
本文介绍了ElasticSearch仅返回具有不同价值的文档的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有这个给定的数据

  {
name:ABC,
favorite_cars:[ferrari,toyota]
},{
name:ABC,
favorite_cars:[ferrari,toyota
},{
name:GEORGE,
favorite_cars:[honda,Hyundae]
}

每当我搜索最喜欢的汽车是丰田的人时,我查询这些数据,它返回这个数据

  {

name:ABC,
favorite_cars:[ferrari,toyota]
$ {
name:ABC,
favorite_cars:[ferrari,toyota]
}
/ pre>

结果是两个名称为ABC的记录。如何仅选择不同的文档?我想得到的结果只是这个

  {
name:ABC,
favorite_cars:[ferrari,toyota]
}

  {
fuzzy_like_this_field:{
favorite_cars:{
like_text :丰田,
max_query_terms:12
}
}
}

我正在使用ElasticSearch 1.0.0。与java api客户端

解决方案

您可以使用聚合术语汇总结果将被分组一个字段,例如 name ,还提供每个字段值的计数,并按结果排序(降序)。

  {
查询:{
fuzzy_like_this_field:{
favorite_cars:{
like_text :丰田,
max_query_terms:12
}
}
},
aggs:{
grouping_by_name:{
条款:{
field:name,
size:0
}
}
}
}

除了命中之外,结果也将在中包含,其中包含 doc_count

  {
take:4,
timed_out
_shards:{
total:5,
successful:5,
failed:0
},
:{
total:2,
max_score:0.19178301,
点击:[{
_index:pru,
_type:pru,
_id:vGkoVV5cR8SN3lvbWzLaFQ,
_score:0.19178301 ,
_source:{name:ABC,favorite_cars:[ferrari,toyota]}
},{
_index:pru ,
_type:pru,
_id:IdEbAcI6TM6oCVxCI_3fug,
_score:0.19178301,
_source:{name ,favorite_cars:[ferrari,toyota]}
}]
},
聚集:{
grouping_by_name:{
buckets:[{
key:abc,
doc_count:2
}]
}
}
}

请注意,由于重复删除和结果排序,使用聚合将是昂贵的。


Let's say I have this given data

{
            "name" : "ABC",
            "favorite_cars" : [ "ferrari","toyota" ]
          }, {
            "name" : "ABC",
            "favorite_cars" : [ "ferrari","toyota" ]
          }, {
            "name" : "GEORGE",
            "favorite_cars" : [ "honda","Hyundae" ]
          }

Whenever I query this data when searching for people who's favorite car is toyota, it returns this data

{

            "name" : "ABC",
            "favorite_cars" : [ "ferrari","toyota" ]
          }, {
            "name" : "ABC",
            "favorite_cars" : [ "ferrari","toyota" ]
          }

the result is Two records of with a name of ABC. How do I select distinct documents only? The result I want to get is only this

{
                "name" : "ABC",
                "favorite_cars" : [ "ferrari","toyota" ]
              }

Here's my Query

{
    "fuzzy_like_this_field" : {
        "favorite_cars" : {
            "like_text" : "toyota",
            "max_query_terms" : 12
        }
    }
}

I am using ElasticSearch 1.0.0. with the java api client

解决方案

You can eliminate duplicates using aggregations. With term aggregation the results will be grouped by one field, e.g. name, also providing a count of the ocurrences of each value of the field, and will sort the results by this count (descending).

{
  "query": {
    "fuzzy_like_this_field": {
      "favorite_cars": {
        "like_text": "toyota",
        "max_query_terms": 12
      }
    }
  },
  "aggs": {
    "grouped_by_name": {
      "terms": {
        "field": "name",
        "size": 0
      }
    }
  }
}

In addition to the hits, the result will also contain the buckets with the unique values in key and with the count in doc_count:

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 0.19178301,
    "hits" : [ {
      "_index" : "pru",
      "_type" : "pru",
      "_id" : "vGkoVV5cR8SN3lvbWzLaFQ",
      "_score" : 0.19178301,
      "_source":{"name":"ABC","favorite_cars":["ferrari","toyota"]}
    }, {
      "_index" : "pru",
      "_type" : "pru",
      "_id" : "IdEbAcI6TM6oCVxCI_3fug",
      "_score" : 0.19178301,
      "_source":{"name":"ABC","favorite_cars":["ferrari","toyota"]}
    } ]
  },
  "aggregations" : {
    "grouped_by_name" : {
      "buckets" : [ {
        "key" : "abc",
        "doc_count" : 2
      } ]
    }
  }
}

Note that using aggregations will be costly because of duplicate elimination and result sorting.

这篇关于ElasticSearch仅返回具有不同价值的文档的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆