ElasticSearch 不返回针对字符串属性的术语查询结果 [英] ElasticSearch not returning results for terms query against string property

查看:18
本文介绍了ElasticSearch 不返回针对字符串属性的术语查询结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下索引文档:

{
    "visitor": {
        "id": <SOME STRING VALUE>
    }
}

文档的映射为:

"visitor": {
    "properties": {
        "id": {
            "type": "string"
         }
     }
 }

当我运行以下查询时,我得到了结果:

When I run the following query I get results:

{
    "query": {
        "filtered": {
            "query": {
                "match_all": {}
             }
        },
        "filter": {
            "term": { "visitor.id": "123" }
        }
    }
}

然而这不是:

{
    "query": {
        "filtered": {
            "query": {
                "match_all": {}
             }
        },
        "filter": {
            "term": { "visitor.id": "ABC" }
        }
    }
}

我一直认为这与分析器有关,并且一直在追查.我也一直想知道使用点表示法来访问嵌套的访问者属性是否有误.

I've been thinking this is related to analyzers and have been chasing that down. I've also been wondering if I was wrong to use dot notation to get to the nested visitor property.

谁能告诉我为什么我不能过滤id为ABC"的访问者,但可以过滤访问者123

推荐答案

您需要了解 elasticsearch 的分析器是如何工作的.分析器执行标记化(将输入拆分为一堆标记,例如在空格上)和一组标记过滤器(过滤掉您不想要的标记,例如 停用词,或修改标记,例如 小写标记过滤器 将所有内容转换为小写).

You need to understand how elasticsearch's analyzers work. Analyzers perform a tokenization (split an input into a bunch of tokens, such as on whitespace), and a set of token filters (filter out tokens you don't want, like stop words, or modify tokens, like the lowercase token filter which converts everything to lower case).

分析在两个非常特定的时间执行 - 在索引期间(当您将内容放入 elasticsearch 时)以及在搜索期间(根据您的查询)(在您正在搜索的字符串上).

Analysis is performed at two very specific times - during indexing (when you put stuff into elasticsearch) and, depending on your query, during searching (on the string you're searching for).

也就是说,默认分析器是标准分析器 由一个标准分词器组成,标准标记过滤器(从标准标记器中清除标记),小写标记过滤器和停用词标记过滤器.

That said, the default analyzer is the standard analyzer which consists of a standard tokenizer, standard token filter (to clean up tokens from the standard tokenizer), lowercase token filter, and stop words token filter.

举个例子,当你保存字符串我爱文森特的馅饼!"进入elasticsearch,并且您正在使用默认的标准分析器,您实际上是在存储i"、love"、vincent"、s"、pie".然后,当您尝试使用 term 查询(未分析)搜索Vincent's"时,您将找不到任何内容,因为Vincent's"不是其中之一代币!但是,如果您使用match 查询(经过分析)搜索Vincent's",您会发现我喜欢文森特的馅饼!"因为vincent"和s"都能找到匹配项.

To put this to an example, when you save the string "I love Vincent's pie!" into elasticsearch, and you're using the default standard analyzer, you're actually storing "i", "love", "vincent", "s", "pie". Then, when you attempt to search for "Vincent's" with a term query (which is not analyzed), you will not find anything because "Vincent's" is not one of those tokens! However, if you search for "Vincent's" using a match query (which is analyzed), you will find "I love Vincent's pie!" because "vincent" and "s" both find matches.

底线,要么:

  1. 在搜索自然语言字符串时使用经过分析的查询,例如 match.
  2. 设置分析仪以满足您的需求.如果您想变得复杂,您可以设置一个自定义分析器,该分析器执行空白标记器或字母标记器或模式标记器,以及您心中想要的任何过滤器.这取决于您的用例,但如果您要处理自然语言句子,我不建议这样做,因为标准分词器是为自然语言搜索而构建的.
  3. 您可以将字段设置为不使用具有以下映射的分析器,这应该适合您的需要:

  1. Use an analyzed query, such as match, when searching natural language strings.
  2. Set up the analyzers to match your needs. You could set up set up a custom analyzer that performs a whitespace tokenizer or a letter tokenizer or a pattern tokenizer if you want to get complicated, as well as whatever filters your heart desires. It depends on your use case, but if you're dealing with natural language sentences I don't recommend this because the standard tokenizer was built for natural language searching.
  3. You can set the field up to not use an analyzer with the following mapping, which should suit your needs:

"visitor": {
    "properties": {
        "id": {
            "type": "string"
            "index": "not_analyzed"
        }
    }
}

参见 http://www.elasticsearch.org/guide/en/elasticsearch/参考/current/analysis.html 以供进一步阅读.

See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis.html for further reading.

这篇关于ElasticSearch 不返回针对字符串属性的术语查询结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆