ElasticSearch仅返回具有不同价值的文档 [英] ElasticSearch returning only documents with distinct value
问题描述
假设我有这个给定的数据
{
name:ABC,
favorite_cars:[ferrari,toyota]
},{
name:ABC,
favorite_cars:[ferrari,toyota
},{
name:GEORGE,
favorite_cars:[honda,Hyundae]
}
每当我搜索最喜欢的汽车是丰田的人时,我查询这些数据,它返回这个数据
{
/ pre>
name:ABC,
favorite_cars:[ferrari,toyota]
$ {
name:ABC,
favorite_cars:[ferrari,toyota]
}
结果是两个名称为ABC的记录。如何仅选择不同的文档?我想得到的结果只是这个
{
name:ABC,
favorite_cars:[ferrari,toyota]
}
{
fuzzy_like_this_field:{
favorite_cars:{
like_text :丰田,
max_query_terms:12
}
}
}
我正在使用ElasticSearch 1.0.0。与java api客户端
解决方案您可以使用聚合。 术语汇总结果将被分组一个字段,例如
name
,还提供每个字段值的计数,并按结果排序(降序)。{
查询:{
fuzzy_like_this_field:{
favorite_cars:{
like_text :丰田,
max_query_terms:12
}
}
},
aggs:{
grouping_by_name:{
条款:{
field:name,
size:0
}
}
}
}
除了
命中
之外,结果也将在键
中包含桶
,其中包含doc_count
{
take:4,
timed_out
_shards:{
total:5,
successful:5,
failed:0
},
:{
total:2,
max_score:0.19178301,
点击:[{
_index:pru,
_type:pru,
_id:vGkoVV5cR8SN3lvbWzLaFQ,
_score:0.19178301 ,
_source:{name:ABC,favorite_cars:[ferrari,toyota]}
},{
_index:pru ,
_type:pru,
_id:IdEbAcI6TM6oCVxCI_3fug,
_score:0.19178301,
_source:{name ,favorite_cars:[ferrari,toyota]}
}]
},
聚集:{
grouping_by_name:{
buckets:[{
key:abc,
doc_count:2
}]
}
}
}
请注意,由于重复删除和结果排序,使用聚合将是昂贵的。
Let's say I have this given data
{ "name" : "ABC", "favorite_cars" : [ "ferrari","toyota" ] }, { "name" : "ABC", "favorite_cars" : [ "ferrari","toyota" ] }, { "name" : "GEORGE", "favorite_cars" : [ "honda","Hyundae" ] }
Whenever I query this data when searching for people who's favorite car is toyota, it returns this data
{ "name" : "ABC", "favorite_cars" : [ "ferrari","toyota" ] }, { "name" : "ABC", "favorite_cars" : [ "ferrari","toyota" ] }
the result is Two records of with a name of ABC. How do I select distinct documents only? The result I want to get is only this
{ "name" : "ABC", "favorite_cars" : [ "ferrari","toyota" ] }
Here's my Query
{ "fuzzy_like_this_field" : { "favorite_cars" : { "like_text" : "toyota", "max_query_terms" : 12 } } }
I am using ElasticSearch 1.0.0. with the java api client
解决方案You can eliminate duplicates using aggregations. With term aggregation the results will be grouped by one field, e.g.
name
, also providing a count of the ocurrences of each value of the field, and will sort the results by this count (descending).{ "query": { "fuzzy_like_this_field": { "favorite_cars": { "like_text": "toyota", "max_query_terms": 12 } } }, "aggs": { "grouped_by_name": { "terms": { "field": "name", "size": 0 } } } }
In addition to the
hits
, the result will also contain thebuckets
with the unique values inkey
and with the count indoc_count
:{ "took" : 4, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 2, "max_score" : 0.19178301, "hits" : [ { "_index" : "pru", "_type" : "pru", "_id" : "vGkoVV5cR8SN3lvbWzLaFQ", "_score" : 0.19178301, "_source":{"name":"ABC","favorite_cars":["ferrari","toyota"]} }, { "_index" : "pru", "_type" : "pru", "_id" : "IdEbAcI6TM6oCVxCI_3fug", "_score" : 0.19178301, "_source":{"name":"ABC","favorite_cars":["ferrari","toyota"]} } ] }, "aggregations" : { "grouped_by_name" : { "buckets" : [ { "key" : "abc", "doc_count" : 2 } ] } } }
Note that using aggregations will be costly because of duplicate elimination and result sorting.
这篇关于ElasticSearch仅返回具有不同价值的文档的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!