多字段,多字,匹配没有query_string [英] Multi-field, multi-word, match without query_string

查看:187
本文介绍了多字段,多字,匹配没有query_string的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想能够匹配多字搜索与多个字段,每个字搜索包含在任何字段中的任何组合。我想要强制使用避免使用 query_string。

  curl -X POSThttp: / localhost:9200 / index / document / 1-d{id:1,firstname:john,middlename:clark,lastname:smith}'
卷曲-X POSThttp:// localhost:9200 / index / document / 2-d{id:2,firstname:john,middlename:paladini,lastname我想搜索约翰·史密斯只匹配文件1。以下查询是我需要的,但是我宁愿避免使用query_string,以防用户通过OR,AND和任何其他高级参数。

  curl -X GET'http:// localhost:9200 / index / _search?per_page = 10& pretty' {
query:{
query_string:{
query:john smith,
default_operator:AND,
:[
firstname,
lastname,
middlename
]
}
}
}'


解决方案

您要查找的是多重匹配查询,但它不会完全按照您的方式执行想要。



比较验证 multi_match vs query_string



multi_match (运算符)将确保至少在一个字段中存在所有条款: / p>

  curl -XGET'http://127.0.0.1:9200/_validate/query?pretty=1&explain=true'-d'
{
multi_match:{
operator:和,
fields:[
firstname,
lastname
],
查询:john smith
}
}
'

#{
#_shards :{
#failed:0,
#successful:1,
#total:1
#},
#explainations [
#{
#index:test,
#explain:((+ lastname:john + lastname:smith) (+ firstname:john + firstname:smith)),
#valid:true
#}
#],
#valid:true
#$

虽然 query_string (with default_operator AND )将检查至少一个字段中存在的每个字词:

  curl -XGET'http://127.0.0.1:9200/_validate/query?pretty=1&explain=true'-d'
{
query_string:{
fields :[
firstname,
lastname
],
query:john smith,
default_operator:AND
}
}
'

#{
#_shards:{
#failed:0,
#successful :1,
#total:1
#},
#解释​​:[
#{
#index:test
#explain:+(firstname:john | lastname:john)+(firstname:smith | lastname:smith),
#valid:true
#}
#],
#valid:true
#}

您有几个选择来实现您以后的目标:


  1. 预先搜索词条,删除通配符等,在使用 query_string


  2. 使用搜索词来解析每个单词,然后生成一个 multi_match 查询每个字


  3. 在映射中使用 index_name 对于名称字段将其数据索引到单个字段中,然后可以将其用于搜索。 (像你自己的全部字段):


如下:

  curl -XPUT'http://127.0.0.1:9200/test/?pretty=1'-d' 
{
mappings:{
test:{
properties:{
firstname:{
index_name名称,
type:string
},
lastname:{
index_name:name,
type字符串
}
}
}
}
}
'

curl -XPOST'http://127.0。 0.1:9200 / test / test?pretty = 1'-d'
{
firstname:john,
lastname:smith
}
'

curl -XGET'http://127.0.0.1:9200/test/test/_search?pretty=1'-d'
{
query :{
match:{
name:{
operator:和,
query:john smith
}
}
}
}
'

#{
#hits:{
#hits:[
#{
#_source:{
#firstname:john,
#lastname:smith
#},
#_score:0.2712221,
#_index:test,
#_id:VJFU_RWbRNaeHF9wNM8fRA,
#_type:test
#$

#max_score:0.2712221
#total:1
#},
#timed_out:false,
#_shards:{
#failed:0,
#successful:5,
#total:5
#},
#take:33
#}

请注意, firstname lastname 不再可独立搜索。这两个字段的数据已被索引到名称



您可以使用多字段路径参数,使它们可以独立地和一起搜索,如下所示:

  curl -XPUT'http://127.0.0.1:9200 / test /?pretty = 1'-d'
{
mappings:{
test:{
properties:{
firstname :{
fields:{
firstname:{
type:string
},
any_name:{
type:string
}
},
path:just_name,
type:multi_field
},
lastname:{
fields:{
any_na我:{
type:string
},
lastname:{
type:string
}
},
path:just_name,
type:multi_field
}
}
}
}
}
'

curl -XPOST'http://127.0.0.1:9200/test/test?pretty=1'-d'
{
firstname :john,
lastname:smith
}
'

搜索 any_name 字段起作用:

  curl  - XGET'http://127.0.0.1:9200/test/test/_search?pretty=1'-d'
{
查询:{
match:{
any_name:{
operator:和,
query:john smith
}
}
}
}
'

#{
#hits:{
#hits:[

#_source:{
#firstname:john,
#lastname:smith
#},
#_score :0.2712221,
#_index:test,
#_id:Xf9qqKt0TpCuyLWioNh-iQ,
#_type:test
#}
#],
#max_score:0.2712221,
#total:1
#},
#timed_out:false,
# _shards:{
#failed:0,
#successful:5,
#total:5
#},
#采取:11
#}

搜索 firstname for john AND smith 不起作用:

 卷曲-XGET'http://127.0.0.1:9200/test/test/_search?pretty=1'-d'
{
查询:{
match:{
firstname:{
operator:和,
query:john smith
}
}
}
}
'

#{
#hits:{
#hits:[],
#max_score:null,
#total:0
#},
#timed_out:false,
#_shards:{
#failed:0,
#success:5,
#total:5
#},
#taken:2
#}

但是搜索 firstname code> john 正确运行:

  curl -XGET'http://127.0。 0.1:9200 / test / test / _search?pretty = 1'-d'
{
query:{
match:{
firstname:{
operator:and,
query:john
}
}
}
}
'

#{
#hits:{
#hits:[
#{
#_source:{
#firstname :john,
#lastname:smith
#},
# _score:0.30685282,
#_index:test,
#_id:Xf9qqKt0TpCuyLWioNh-iQ,
#_type:test
#$

#max_score:0.30685282
#total:1
#},
#timed_out:false,
#_shards:{
#failed:0,
#successful:5,
#total:5
#},
#take:3
#}


I would like to be able to match a multi word search against multiple fields where every word searched is contained in any of the fields, any combination. The catch is I would like to avoid using query_string.

curl -X POST "http://localhost:9200/index/document/1" -d '{"id":1,"firstname":"john","middlename":"clark","lastname":"smith"}'
curl -X POST "http://localhost:9200/index/document/2" -d '{"id":2,"firstname":"john","middlename":"paladini","lastname":"miranda"}'

I would like the search for 'John Smith' to match only document 1. The following query does what I need but I would rather avoid using query_string in case the user passes "OR", "AND" and any of the other advanced params.

curl -X GET 'http://localhost:9200/index/_search?per_page=10&pretty' -d '{
  "query": {
    "query_string": {
      "query": "john smith",
      "default_operator": "AND",
      "fields": [
        "firstname",
        "lastname",
        "middlename"
      ]
    }
  }
}'

解决方案

What you are looking for is the multi-match query, but it doesn't perform in quite the way you would like.

Compare the output of validate for multi_match vs query_string.

multi_match (with operator and) will make sure that ALL terms exist in at least one field:

curl -XGET 'http://127.0.0.1:9200/_validate/query?pretty=1&explain=true'  -d '
{
   "multi_match" : {
      "operator" : "and",
      "fields" : [
         "firstname",
         "lastname"
      ],
      "query" : "john smith"
   }
}
'

# {
#    "_shards" : {
#       "failed" : 0,
#       "successful" : 1,
#       "total" : 1
#    },
#    "explanations" : [
#       {
#          "index" : "test",
#          "explanation" : "((+lastname:john +lastname:smith) | (+firstname:john +firstname:smith))",
#          "valid" : true
#       }
#    ],
#    "valid" : true
# }

While query_string (with default_operator AND) will check that EACH term exists in at least one field:

curl -XGET 'http://127.0.0.1:9200/_validate/query?pretty=1&explain=true'  -d '
{
   "query_string" : {
      "fields" : [
         "firstname",
         "lastname"
      ],
      "query" : "john smith",
      "default_operator" : "AND"
   }
}
'

# {
#    "_shards" : {
#       "failed" : 0,
#       "successful" : 1,
#       "total" : 1
#    },
#    "explanations" : [
#       {
#          "index" : "test",
#          "explanation" : "+(firstname:john | lastname:john) +(firstname:smith | lastname:smith)",
#          "valid" : true
#       }
#    ],
#    "valid" : true
# }

So you have a few choices to achieve what you are after:

  1. Preparse the search terms, to remove things like wildcards, etc, before using the query_string

  2. Preparse the search terms to extract each word, then generate a multi_match query per word

  3. Use index_name in your mapping for the name fields to index their data into a single field, which you can then use for search. (like your own custom all field):

As follows:

curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1'  -d '
{
   "mappings" : {
      "test" : {
         "properties" : {
            "firstname" : {
               "index_name" : "name",
               "type" : "string"
            },
            "lastname" : {
               "index_name" : "name",
               "type" : "string"
            }
         }
      }
   }
}
'

curl -XPOST 'http://127.0.0.1:9200/test/test?pretty=1'  -d '
{
   "firstname" : "john",
   "lastname" : "smith"
}
'

curl -XGET 'http://127.0.0.1:9200/test/test/_search?pretty=1'  -d '
{
   "query" : {
      "match" : {
         "name" : {
            "operator" : "and",
            "query" : "john smith"
         }
      }
   }
}
'

# {
#    "hits" : {
#       "hits" : [
#          {
#             "_source" : {
#                "firstname" : "john",
#                "lastname" : "smith"
#             },
#             "_score" : 0.2712221,
#             "_index" : "test",
#             "_id" : "VJFU_RWbRNaeHF9wNM8fRA",
#             "_type" : "test"
#          }
#       ],
#       "max_score" : 0.2712221,
#       "total" : 1
#    },
#    "timed_out" : false,
#    "_shards" : {
#       "failed" : 0,
#       "successful" : 5,
#       "total" : 5
#    },
#    "took" : 33
# }

Note however, that firstname and lastname are no longer searchable independently. The data for both fields has been indexed into name.

You could use multi-fields with the path parameter to make them searchable both independently and together, as follows:

curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1'  -d '
{
   "mappings" : {
      "test" : {
         "properties" : {
            "firstname" : {
               "fields" : {
                  "firstname" : {
                     "type" : "string"
                  },
                  "any_name" : {
                     "type" : "string"
                  }
               },
               "path" : "just_name",
               "type" : "multi_field"
            },
            "lastname" : {
               "fields" : {
                  "any_name" : {
                     "type" : "string"
                  },
                  "lastname" : {
                     "type" : "string"
                  }
               },
               "path" : "just_name",
               "type" : "multi_field"
            }
         }
      }
   }
}
'

curl -XPOST 'http://127.0.0.1:9200/test/test?pretty=1'  -d '
{
   "firstname" : "john",
   "lastname" : "smith"
}
'

Searching the any_name field works:

curl -XGET 'http://127.0.0.1:9200/test/test/_search?pretty=1'  -d '
{
   "query" : {
      "match" : {
         "any_name" : {
            "operator" : "and",
            "query" : "john smith"
         }
      }
   }
}
'

# {
#    "hits" : {
#       "hits" : [
#          {
#             "_source" : {
#                "firstname" : "john",
#                "lastname" : "smith"
#             },
#             "_score" : 0.2712221,
#             "_index" : "test",
#             "_id" : "Xf9qqKt0TpCuyLWioNh-iQ",
#             "_type" : "test"
#          }
#       ],
#       "max_score" : 0.2712221,
#       "total" : 1
#    },
#    "timed_out" : false,
#    "_shards" : {
#       "failed" : 0,
#       "successful" : 5,
#       "total" : 5
#    },
#    "took" : 11
# }

Searching firstname for john AND smith doesn't work:

curl -XGET 'http://127.0.0.1:9200/test/test/_search?pretty=1'  -d '
{
   "query" : {
      "match" : {
         "firstname" : {
            "operator" : "and",
            "query" : "john smith"
         }
      }
   }
}
'

# {
#    "hits" : {
#       "hits" : [],
#       "max_score" : null,
#       "total" : 0
#    },
#    "timed_out" : false,
#    "_shards" : {
#       "failed" : 0,
#       "successful" : 5,
#       "total" : 5
#    },
#    "took" : 2
# }

But searching firstname for just john works correctly:

curl -XGET 'http://127.0.0.1:9200/test/test/_search?pretty=1'  -d '
{
   "query" : {
      "match" : {
         "firstname" : {
            "operator" : "and",
            "query" : "john"
         }
      }
   }
}
'

# {
#    "hits" : {
#       "hits" : [
#          {
#             "_source" : {
#                "firstname" : "john",
#                "lastname" : "smith"
#             },
#             "_score" : 0.30685282,
#             "_index" : "test",
#             "_id" : "Xf9qqKt0TpCuyLWioNh-iQ",
#             "_type" : "test"
#          }
#       ],
#       "max_score" : 0.30685282,
#       "total" : 1
#    },
#    "timed_out" : false,
#    "_shards" : {
#       "failed" : 0,
#       "successful" : 5,
#       "total" : 5
#    },
#    "took" : 3
# }

这篇关于多字段,多字,匹配没有query_string的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆