查询多个单词时,通配符无法正常工作 [英] Wildcard doesn't work as expected when querying by more than a word

查看:152
本文介绍了查询多个单词时,通配符无法正常工作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我在邮件"字段中搜索包含例如被叫"的文档,则会得到预期的结果,但是当我搜索被叫",被叫*"或

If I search documents containing e.g "called" in "message" field I get an expected result, but when I search for "was called", "was called*" or

"*was called*"

我什么也没得到,尽管我有很多文档,其消息字段包含以下内容"REST API调用了应用程序".

I get nothing, although I have a lot of documents whose message field contains the following content "Application was called by REST API".

这是我发送的查询的一部分:

Here is a part of a query I send:

"wildcard": {
    "message": {
        "wildcard": "was called",
        "boost": 1.0
    }
}

这是映射的一部分:

"mappings": {
    "doc": {
        "dynamic_templates": [
            {
                "message_field": {
                    "path_match": "message",
                    "match_mapping_type": "string",
                    "mapping": {
                        "norms": false,
                        "type": "text"
                    }
                }
            },
            {
                "string_fields": {
                    "match": "*",
                    "match_mapping_type": "string",
                    "mapping": {
                        "fields": {
                            "keyword": {
                                "ignore_above": 256,
                                "type": "keyword"
                            }
                        },
                        "norms": false,
                        "type": "text"
                    }
                }
            }
        ],
        "properties": {
            ...
            "message": {
                "type": "text",
                "norms": false
            }
        }
    }
}

我搜索的索引由Logstash自动创建.

Indexes I search in are automatically created by Logstash.

我在另一个领域也有类似的问题;我在该字段中具有以下值:"NP-00121". * 00121有效,但* -00121无效.

I have a similar problem with another field; I have the following value in the field: "NP-00121". *00121 works, but *-00121 doesn't.

edit:还有一个更多示例:当我发送以下通配符查询时,我有一个"requestUri"字段,其中包含"/api/v1/log/rest","/api/v1/log/notification"等.没有"/api/v1 *".

edit: and one example more: I have a "requestUri" field containing "/api/v1/log/rest", "/api/v1/log/notification" etc. when I send the following wildcard query I get nothing "/api/v1*".

因此,使用空格和破折号时似乎出现了问题.有人可以帮我解决这个问题吗?

So it looks like problem appears when using spaces and dashes. Could anyone help me to solve this problem?

推荐答案

通配符在令牌中使用.您的消息字段被索引为文本,因此将被标记为单词.

Wildcards are used within tokens. Your message field is indexed as text, and so will be tokenized into words.

基本上,对于被调用"之类的查询,您不需要通配符.只需使用短语查询,例如:

Basically, you don't need wildcards for a query like "was called". Simply use a phrase query like:

"query": {
    "match_phrase" : {
        "message" : "was called"
    }
}

,或者如果您更喜欢

通配符查询对于搜索部分字词非常有用,例如:

A wildcard query would be useful for searching for partial terms, something like:

"query": {
    "wildcard" : { "message" : "call*" }
}

如果要查找包含通话",被通话"或通话"的所有文档.

If you wanted to find all docs that contain "call", "called" or "calling".

对于类似NP-00121的值或URI,如果不分析这些字段,则可能会更有用.因为它们正被分成令牌("np"和"00121"),因此您遇到的问题是.您可以将这些字段编为关键字" 类型而不是文本",以便将整个字段编入单个未经分析的标记中.

For values like NP-00121, or for URIs, it would likely be more useful if those fields were not analyzed. As it is these are getting separated into tokens ('np' and '00121'), thus the problem you are seeing. You can index these fields as the "keyword" type instead of "text", to have the whole field indexed as a single, unanalyzed token.

这篇关于查询多个单词时,通配符无法正常工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆