Elasticsearch过滤多个术语,仅包含匹配结果,而不包含任何匹配结果 [英] Elasticsearch filter multiple terms with only matching results and not any of them

查看:263
本文介绍了Elasticsearch过滤多个术语,仅包含匹配结果,而不包含任何匹配结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在所有多词搜索中仅获取经过过滤的匹配结果。我有一个示例表,其中titleid是一个映射int字段,personid是一个关键字:

How I can get only filtered matching results with all the multi term search. I have this sample table where titleid is a mapping int field and personid is a keyword:

titleid:1,personid:a
titleid:3,personid:a

titleid:1,personid:b
titleid:2,personid:b

titleid:1,personid:c
titleid:5,personid:c

删除结果为:

titleid:1

使用这样的示例查询:

{query:
    {bool:
    {filter:
            {must:[
                    {terms : {fields: {personid:[a,b,c]}}
                 ]
            }}}}

我得到以下结果:

titleid: 1,2,3,5

也许这会有所帮助,我做了查询在SQL中,并得到了预期的结果。我所做的是让查询为我提供与搜索到的参数数量匹配的titleid的总和。这只是为了更加自我解释,其想法是使用elasticsearch。

Maybe this will help, I did the query in sql and got the expected result. What I did was ask the query to give me the sum of titleid that matches the quantity of searched parameters. This is only to be more self explained, the idea is to use elasticsearch.

select titleid
from (
   select count(titleid) as title_count, titleid 
   from table1 
   where personid in ('a','b','c')
   group by titleid
) as vw 
where title_count = 3


推荐答案

只想要具有 titleid == 1 personid =='a'的记录,您可以在两个字段上进行过滤。仅布尔查询使用必须应该 most_not 。使用过滤器,因为它按照定义进行过滤(例如,删除),因此它是必须

if you only want records with titleid == 1 AND personid == 'a' you can filter on both fields. only the boolean query uses must, should, and most_not. with a filter since it's filtering (eg, removing) by definition it's a must

"query": {
  "bool": {
    "filter": [
      {
        "term": {
          "titleId": { "value": 1 }
        } 
      },
      {
        "term": {
          "personid": { "value": "a" }
        }
      }
    ]
  }
}

更新:

现在,您的问题看起来像您想过滤并汇总,然后根据这些结果进行汇总。有一些指标和< a href = https://www.elastic.co/guide/zh-CN/elasticsearch/reference/5.6/search-aggregations-bucket.html rel = nofollow noreferrer> bucket 聚合

Now your question looks like you want to filter and aggregate your results and then aggregate on those. There's a few metrics and bucket aggregations

使用存储桶选择器聚合(未经测试,但如果不正确,则应该非常接近)

Using bucket selector aggregation (this isn't tested but should be very close if not correct)

{
    "aggs" : {
        "title_id" : {
            "filter" : { "terms": { "personid": ["a","b","c"] } },
            "aggs" : {
                "id_count" : { "count" : { "field" : "titleid" } }
            }
        },      
        aggs": {
            "count_filter": {
               "bucket_selector": {
                  "buckets_path": {
                     "the_doc_count": "_count"
                  },
                  "script": "the_doc_count == 3"
               }
            }
         }  
    }
}

不过,请注意,管道聚合在其他聚合产生的输出,因此计算初始doc_counts所需完成的工作总量将是相同的。由于需要为每个输入存储区执行脚本部分,因此对于高基数字段,操作可能会很慢,如成千上万的术语。

However, be aware that Pipeline aggregations work on the outputs produced from other aggregations, so the overall amount of work that needs to be done to calculate the initial doc_counts will be the same. Since the script parts needs to be executed for each input bucket, the opetation might potentially be slow for high cardinality fields as in thousands of thousands of terms.

这篇关于Elasticsearch过滤多个术语,仅包含匹配结果,而不包含任何匹配结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆