使用min_doc_count = 0的弹性搜索聚合返回与查询结果或命中无关的所有存储桶 [英] Elastic search aggregation using min_doc_count=0 returns all the buckets which are not related to query results or hits
问题描述
这是我的查询-
{来自":0,大小":100,查询":{布尔":{过滤器":[{条款":{" folderId.keyword":["ff98505e-cdff-43aa-8b05-197bc3f3265e"],提升":1}},{条款":{" objectType.keyword":[文件"],提升":1}},{条款":{"tenantId":{值":"34202",提升":1}}}],"adjust_pure_negative":是的,提升":1}},"aggs":{"_byformat":{条款":{字段":"format.keyword"," min_doc_count":0:尺寸":200}}}}
min_doc_count = 0的结果-
获取":1"timed_out";: 错误的,"_shards":{总计":1成功":1跳过":0,失败":0},点击数":{总计":3"max_score";:0.0,点击数":[{"_index":"plnesdv1-34202-1","_type":"_ doc","_id":"6adbda83-53ad-457f-a2ab-d5b04c643005"," _score":0.0,"_source":{格式":"vnd.openxmlformats-officedocument.spreadsheetml.sheet",外部共享":"N",说明": 空值,"dateModified": 空值,类型":应用程序","folderId":"ff98505e-cdff-43aa-8b05-197bc3f3265e",标签":[],"objectType";:文件","dateCreated": 空值,名称":新的XLSX文件","tenantId";:"34202","modifiedBy":"rdt001"","id":"6adbda83-53ad-457f-a2ab-d5b04c643005",状态":有效","expirationDate": 空值}},{"_index":"plnesdv1-34202-1","_type":"_ doc","_id":"b1000a15-2d80-41f4-a5df-ba5c27f8e9c6"," _score":0.0,"_source":{格式":"vnd.ms-excel",外部共享":"N",说明": 空值,"dateModified": 空值,类型":应用程序","folderId":"ff98505e-cdff-43aa-8b05-197bc3f3265e",标签":[],"objectType";:文件","dateCreated": 空值,名称":新的XLS文件","tenantId";:"34202","modifiedBy":"rdt001"","id":"b1000a15-2d80-41f4-a5df-ba5c27f8e9c6",状态":有效","expirationDate": 空值}},{"_index":"plnesdv1-34202-1","_type":"_ doc","_id":"630e9f49-3368-408d-a091-03f253127004"," _score":0.0,"_source":{格式":"msword",外部共享":"N",说明": 空值,"dateModified": 空值,类型":应用程序","folderId":"ff98505e-cdff-43aa-8b05-197bc3f3265e",标签":[],"objectType";:文件","dateCreated": 空值,名称":新DOC文件","tenantId";:"34202","modifiedBy":"rdt001"","id":"630e9f49-3368-408d-a091-03f253127004",状态":有效","expirationDate": 空值}}]},集合"是指:{"_byformat":{" doc_count_error_upper_bound":0," sum_other_doc_count":0,存储桶":[{键":"msword"," doc_count":1},{键":"vnd.ms-excel"," doc_count":1},{键":"vnd.openxmlformats-officedocument.spreadsheetml.sheet"," doc_count":1},{键":"bmp"," doc_count":0},{键":"gif"," doc_count":0},{键":"html"," doc_count":0}]}}}
min_doc_count = 1的结果-
{接受":0,"timed_out";: 错误的,"_shards":{总计":1成功":1跳过":0,失败":0},点击数":{总计":3"max_score";:0.0,点击数":[{"_index":"plnesdv1-34202-1","_type":"_ doc","_id":"6adbda83-53ad-457f-a2ab-d5b04c643005"," _score":0.0,"_source":{格式":"vnd.openxmlformats-officedocument.spreadsheetml.sheet",外部共享":"N",说明": 空值,"dateModified": 空值,类型":应用程序","folderId":"ff98505e-cdff-43aa-8b05-197bc3f3265e",标签":[],"objectType";:文件","dateCreated": 空值,名称":新的XLSX文件","tenantId";:"34202","modifiedBy":"rdt001"","id":"6adbda83-53ad-457f-a2ab-d5b04c643005",状态":有效","expirationDate": 空值}},{"_index":"plnesdv1-34202-1","_type":"_ doc","_id":"b1000a15-2d80-41f4-a5df-ba5c27f8e9c6"," _score":0.0,"_source":{格式":"vnd.ms-excel",外部共享":"N",说明": 空值,"dateModified": 空值,类型":应用程序","folderId":"ff98505e-cdff-43aa-8b05-197bc3f3265e",标签":[],"objectType";:文件","dateCreated": 空值,名称":新的XLS文件","tenantId";:"34202","modifiedBy":"rdt001"","id":"b1000a15-2d80-41f4-a5df-ba5c27f8e9c6",状态":有效","expirationDate": 空值}},{"_index":"plnesdv1-34202-1","_type":"_ doc","_id":"630e9f49-3368-408d-a091-03f253127004"," _score":0.0,"_source":{格式":"msword",外部共享":"N",说明": 空值,"dateModified": 空值,类型":应用程序","folderId":"ff98505e-cdff-43aa-8b05-197bc3f3265e",标签":[],"objectType";:文件","dateCreated": 空值,名称":新DOC文件","tenantId";:"34202","modifiedBy":"rdt001"","id":"630e9f49-3368-408d-a091-03f253127004",状态":有效","expirationDate": 空值}}]},集合"是指:{"_byformat":{" doc_count_error_upper_bound":0," sum_other_doc_count":0,存储桶":[{键":"msword"," doc_count":1},{键":"vnd.ms-excel"," doc_count":1},{键":"vnd.openxmlformats-officedocument.spreadsheetml.sheet"," doc_count":1}]}}}
当min_doc_count = 1时,聚合是正确的,并且仅提取与匹配项相关的存储桶.
谁能告诉我为什么在设置min_doc_count = 0时聚合会提取所有存储桶.我浏览了弹性搜索文档,它指出此行为是设计使然,也可以通过其他任何方式仅针对匹配且计数为零的方式获取聚合存储桶.
首先,您需要了解看到 计数为零的桶 的含义./p>
以下摘录自 这可能不是您想要的东西,但希望对您有所帮助! Here is my query - result with min_doc_count = 0 - result with min_doc_count = 1 - Aggregations are correct when min_doc_count = 1 and only buckets relevant to hits are fetched. Could anyone tell me why aggregation is fetching all buckets when min_doc_count = 0 is set. I have gone through the elastic search documentation, it states that this behavior is by design, any other way to get aggregation buckets only for hits and with zero count as well. First of all you need to understand what is the meaning of seeing buckets with zero counts. Below is an excerpt from the Terms Aggregation link: Setting min_doc_count=0 will also return buckets for terms that didn’t
match any hit. However, some of the returned terms which have a
document count of zero might only belong to deleted documents or
documents from other types, so there is no warranty that a match_all
query would find a positive document count for those terms. So most likely it appears to be the count for the deleted documents. Note that the aggregation would only get calculated on the documents that get filtered by the query. However you need to keep in mind that while ES keeps merging the segments of indexes behind the scenes(that happens during deletion process), the results for count with 0 may not be consistent and over a period of time may, eventually (if no further dos are deleted from that point onwards) you may not get any terms with 0 count at all once the merging process is completed. So in a way it is safe to say to your business leads, that they are the counts for deleted docs and you can push the above argument to them. And if they say they need count of docs/terms of deleted docs, it is like finding a document/terms which does not exist in the index and it does not even make sense right. As per why does this still shows, that is probably due to the segment merging process that happens in ES and it is by design. So no, you cannot apply query/filter on deleted documents (take a step back and imagine that) and hence you cannot control the data related to docs not available in first place. You can make use of the below aggregation which would give you as per your requirement mentioned in the comment: Note that I've made use of Terms, Value Count and Bucket Selector Aggregations This may not be what you are looking for but I hope that helps! 这篇关于使用min_doc_count = 0的弹性搜索聚合返回与查询结果或命中无关的所有存储桶的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!{
"from": 0,
"size": 100,
"query": {
"bool": {
"filter": [
{
"terms": {
"folderId.keyword": [
"ff98505e-cdff-43aa-8b05-197bc3f3265e"
],
"boost": 1
}
},
{
"terms": {
"objectType.keyword": [
"File"
],
"boost": 1
}
},
{
"term": {
"tenantId": {
"value": "34202",
"boost": 1
}
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
},
"aggs":{
"_byformat":{
"terms":{
"field":"format.keyword",
"min_doc_count":0,
"size":200
}
}
}
}
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 0.0,
"hits" : [
{
"_index" : "plnesdv1-34202-1",
"_type" : "_doc",
"_id" : "6adbda83-53ad-457f-a2ab-d5b04c643005",
"_score" : 0.0,
"_source" : {
"format" : "vnd.openxmlformats-officedocument.spreadsheetml.sheet",
"externalSharing" : "N",
"description" : null,
"dateModified" : null,
"type" : "application",
"folderId" : "ff98505e-cdff-43aa-8b05-197bc3f3265e",
"tags" : [ ],
"objectType" : "File",
"dateCreated" : null,
"name" : "New XLSX file",
"tenantId" : "34202",
"modifiedBy" : "rdt001",
"id" : "6adbda83-53ad-457f-a2ab-d5b04c643005",
"status" : "active",
"expirationDate" : null
}
},
{
"_index" : "plnesdv1-34202-1",
"_type" : "_doc",
"_id" : "b1000a15-2d80-41f4-a5df-ba5c27f8e9c6",
"_score" : 0.0,
"_source" : {
"format" : "vnd.ms-excel",
"externalSharing" : "N",
"description" : null,
"dateModified" : null,
"type" : "application",
"folderId" : "ff98505e-cdff-43aa-8b05-197bc3f3265e",
"tags" : [ ],
"objectType" : "File",
"dateCreated" : null,
"name" : "New XLS file",
"tenantId" : "34202",
"modifiedBy" : "rdt001",
"id" : "b1000a15-2d80-41f4-a5df-ba5c27f8e9c6",
"status" : "active",
"expirationDate" : null
}
},
{
"_index" : "plnesdv1-34202-1",
"_type" : "_doc",
"_id" : "630e9f49-3368-408d-a091-03f253127004",
"_score" : 0.0,
"_source" : {
"format" : "msword",
"externalSharing" : "N",
"description" : null,
"dateModified" : null,
"type" : "application",
"folderId" : "ff98505e-cdff-43aa-8b05-197bc3f3265e",
"tags" : [ ],
"objectType" : "File",
"dateCreated" : null,
"name" : "New DOC file",
"tenantId" : "34202",
"modifiedBy" : "rdt001",
"id" : "630e9f49-3368-408d-a091-03f253127004",
"status" : "active",
"expirationDate" : null
}
}
]
},
"aggregations" : {
"_byformat" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "msword",
"doc_count" : 1
},
{
"key" : "vnd.ms-excel",
"doc_count" : 1
},
{
"key" : "vnd.openxmlformats-officedocument.spreadsheetml.sheet",
"doc_count" : 1
},
{
"key" : "bmp",
"doc_count" : 0
},
{
"key" : "gif",
"doc_count" : 0
},
{
"key" : "html",
"doc_count" : 0
}
]
}
}
}
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 0.0,
"hits" : [
{
"_index" : "plnesdv1-34202-1",
"_type" : "_doc",
"_id" : "6adbda83-53ad-457f-a2ab-d5b04c643005",
"_score" : 0.0,
"_source" : {
"format" : "vnd.openxmlformats-officedocument.spreadsheetml.sheet",
"externalSharing" : "N",
"description" : null,
"dateModified" : null,
"type" : "application",
"folderId" : "ff98505e-cdff-43aa-8b05-197bc3f3265e",
"tags" : [ ],
"objectType" : "File",
"dateCreated" : null,
"name" : "New XLSX file",
"tenantId" : "34202",
"modifiedBy" : "rdt001",
"id" : "6adbda83-53ad-457f-a2ab-d5b04c643005",
"status" : "active",
"expirationDate" : null
}
},
{
"_index" : "plnesdv1-34202-1",
"_type" : "_doc",
"_id" : "b1000a15-2d80-41f4-a5df-ba5c27f8e9c6",
"_score" : 0.0,
"_source" : {
"format" : "vnd.ms-excel",
"externalSharing" : "N",
"description" : null,
"dateModified" : null,
"type" : "application",
"folderId" : "ff98505e-cdff-43aa-8b05-197bc3f3265e",
"tags" : [ ],
"objectType" : "File",
"dateCreated" : null,
"name" : "New XLS file",
"tenantId" : "34202",
"modifiedBy" : "rdt001",
"id" : "b1000a15-2d80-41f4-a5df-ba5c27f8e9c6",
"status" : "active",
"expirationDate" : null
}
},
{
"_index" : "plnesdv1-34202-1",
"_type" : "_doc",
"_id" : "630e9f49-3368-408d-a091-03f253127004",
"_score" : 0.0,
"_source" : {
"format" : "msword",
"externalSharing" : "N",
"description" : null,
"dateModified" : null,
"type" : "application",
"folderId" : "ff98505e-cdff-43aa-8b05-197bc3f3265e",
"tags" : [ ],
"objectType" : "File",
"dateCreated" : null,
"name" : "New DOC file",
"tenantId" : "34202",
"modifiedBy" : "rdt001",
"id" : "630e9f49-3368-408d-a091-03f253127004",
"status" : "active",
"expirationDate" : null
}
}
]
},
"aggregations" : {
"_byformat" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "msword",
"doc_count" : 1
},
{
"key" : "vnd.ms-excel",
"doc_count" : 1
},
{
"key" : "vnd.openxmlformats-officedocument.spreadsheetml.sheet",
"doc_count" : 1
}
]
}
}
}
Aggregation Query:
POST <your_index_name>/_search
{
"size": 0,
"aggs": {
"myaggs_count_zero": { <--- Agg for count 0
"terms": {
"field": "format.keyword"
},
"aggs": {
"document_counts": {
"value_count" : {
"field" : "format.keyword"
}
},
"by_account_filtered": {
"bucket_selector": {
"buckets_path": {
"totalDocs": "document_counts"
},
"script": "params.totalDocs == 0"
}
}
}
},
"myaggs_count_not_zero": { <--- Agg for normal count
"terms": {
"field": "format.keyword",
"min_doc_count": 1
}
}
}
}