聚合中的弹性同义词用法 [英] Elastic synonym usage in aggregations

查看:134
本文介绍了聚合中的弹性同义词用法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

情况:

使用的弹性版本:2.3.1

Elastic version used: 2.3.1

我有一个像这样配置的弹性索引

I have an elastic index configured like so

PUT /my_index
{
  "settings": {
    "analysis": {
      "filter": {
        "my_synonym_filter": {
          "type": "synonym", 
          "synonyms": [ 
            "british,english",
            "queen,monarch"
          ]
        }
      },
      "analyzer": {
        "my_synonyms": {
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "my_synonym_filter" 
          ]
        }
      }
    }
  }
}

这很棒,当我查询文档并使用查询词" english "或" queen "时,会得到所有与 british 和 monarch .当我在过滤器聚合中使用同义词时,它不起作用.例如

Which is great, when I query the document and use a query term "english" or "queen" I get all documents matching british and monarch. When I use a synonym term in filter aggregation it doesnt work. For example

在我的索引中,我有5个文档,其中3个为君主,其中2个为女王

In my index I have 5 documents, 3 of them have monarch, 2 of them have queen

POST /my_index/_search
{
  "size": 0,
  "query" : {
      "match" : {
         "status.synonym":{
            "query": "queen",
            "operator": "and"
         }
      }
   },
     "aggs" : {
        "status_terms" : {
            "terms" : { "field" : "status.synonym" }
        },
        "monarch_filter" : {
            "filter" : { "term": { "status.synonym": "monarch" } }
        }
    },
   "explain" : 0
}

结果产生:

总点击数:

  • 5个文档计数(如预期的那样,太好了!)
  • 状态条款:Queen的5个文档计数(如预期的那样,太好了!)
  • 君主过滤器:0个文档计数

我尝试了不同的同义词过滤器配置:

I have tried different synonym filter configuration:

  • 女王,君主
  • queen,monarch =>女王
  • 女王,君主=>女王,君主

但是以上内容并没有改变结果.我想得出一个结论,也许您只能在查询时使用过滤器,但是如果术语聚合有效,为什么不应该过滤,因此我认为它的同义词过滤器配置是错误的.在此处.

But the above hasn't changed the results. I was wanting to conclude that maybe you can use filters at query time only but then if terms aggregation is working why shouldn't filter, hence I think its my synonym filter configuration that is wrong. A more extensive synonym filter example can be found here.

问题:

如何在过滤器聚合中使用/配置同义词?

How to use/configure synonyms in filter aggregation?

复制上述案例的示例: 1.创建并配置索引:

Example to replicate the case above: 1. Create and configure index:

PUT /my_index
{
  "settings": {
    "analysis": {
      "filter": {
        "my_synonym_filter": {
          "type": "synonym",
          "synonyms": [
            "wlh,wellhead=>wellwell"
          ]
        }
      },
      "analyzer": {
        "my_synonyms": {
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "my_synonym_filter"
          ]
        }
      }
    }
  }
}

PUT my_index/_mapping/job
{
  "properties": {
    "title":{
      "type": "string",
      "analyzer": "my_synonyms"
    }
  }
}

2.放入两个文档:

PUT my_index/job/1
{
    "title":"wellhead smth else"
}

PUT my_index/job/2
{
    "title":"wlh other stuff"
}

3.在 wlh 上执行搜索,该搜索应返回2个文档;有一个术语集合,其中应该有2个 wellwell 文档和一个过滤器,该过滤器不应包含0个计数:

3.Execute a search on wlh which should return 2 documents; have a terms aggregation which should have 2 documents for wellwell and a filter which shouldn't have 0 count:

POST my_index/_search
{
  "size": 0,
  "query" : {
      "match" : {
         "title":{
            "query": "wlh",
            "operator": "and"
         }
      }
   },
     "aggs" : {
        "wlhAggs" : {
            "terms" : { "field" : "title" }
        },
        "wlhFilter" : {
            "filter" : { "term": { "title": "wlh"     } }
        }
    },
   "explain" : 0
}

此查询的结果是:

   {
   "took": 8,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 2,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "wlhAggs": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": [
            {
               "key": "wellwell",
               "doc_count": 2
            },
            {
               "key": "else",
               "doc_count": 1
            },
            {
               "key": "other",
               "doc_count": 1
            },
            {
               "key": "smth",
               "doc_count": 1
            },
            {
               "key": "stuff",
               "doc_count": 1
            }
         ]
      },
      "wlhFilter": {
         "doc_count": 0
      }
   }
}

那是我的问题,wlhFilter中应该至少包含1个文档计数.

And thats my problem, the wlhFilter should have at least 1 doc count in it.

推荐答案

因此,在下面的@Byron Voorbach和他的评论的帮助下,这是我的解决方案:

So with the help of @Byron Voorbach below and his comments this is my solution:

  • 我创建了一个单独的字段,并在上面使用了同义词分析器 相对于拥有一个属性字段(mainfield.property).
  • 最重要的是问题是我的同义词已签约!一世 例如,有英国,英国=>英国.将其更改为 英国,英国,英国解决了我的问题,过滤器聚合为 返回正确数量的文档.
  • I have created a separate field which I use synonym analyser on, as opposed to having a property field (mainfield.property).
  • And most importantly the problem was my synonyms were contracted! I had, for example, british,english => uk. Changing that to british,english,uk solved my issue and the filter aggregation is returning the right number of documents.

希望这对某人有帮助,或者至少指向正确的方向.

Hope this helps someone, or at least point to the right direction.

哦,上帝赞美文档!我已经完全解决了有关过滤器(S!)聚合的问题(链接此处).在过滤器配置中,我指定了查询的匹配类型,它起作用了!最终是这样的:

Oh lord praise the documentation! I completely fixed my issue with Filters (S!) aggregation (link here). In filters configuration I specified Match type of query and it worked! Ended up with something like this:

"aggs" : {
    "messages" : {
      "filters" : {
        "filters" : {
          "status" :   { "match" : { "cats.saurus" : "monarch"   }},
          "country" : { "match" : { "cats.saurus" : "british" }}
        }
      }
    }
  }

这篇关于聚合中的弹性同义词用法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆