如何告诉ElasticSearch多重匹配查询我希望数字字段(以字符串形式存储)返回与数字字符串匹配的内容? [英] How do I tell an ElasticSearch multi-match query that I want numeric fields, stored as strings, to return matches with numeric strings?

查看:215
本文介绍了如何告诉ElasticSearch多重匹配查询我希望数字字段(以字符串形式存储)返回与数字字符串匹配的内容?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写Flask应用程序,并且正在使用Elasticsearch.

I am writing a Flask app and I am using elasticsearch.

这里是search.py:

from flask import current_app

def query_object(index, fields, query, page, per_page, fuzziness=0):
    search = current_app.elasticsearch.search(
        index=index,
        body={'query': {'multi_match': {'query': str(query), 'fields': fields, 'fuzziness': fuzziness, 'lenient': True}},
                'from': (page - 1) * per_page, 'size': per_page}
    )

    ids = [int(hit['_id']) for hit in search['hits']['hits']]
    return ids, search['hits']['total']['value']

以下模型已建立索引:

class WishList(db.Model, SearchableMixin):
    __searchable__ = ['first_name', 'gender', 'wants', 'needs', 'wear',
    'read', 'shoe_size_category', 'shoe_type', 'sheet_size', 'additional_comments', 'time_chosen',
    'age', 'shoe_sock_size', 'program_number']

    id = db.Column(db.Integer, primary_key=True)
    program_number = db.Column(db.String(4))
    first_name = db.Column(db.String(20))
    age = db.Column(db.String(10))
    gender = db.Column(db.String(20))
    wants = db.Column(db.String(300))
    needs = db.Column(db.String(300))
    wear = db.Column(db.String(300))
    read = db.Column(db.String(300))
    pant_dress_size = db.Column(db.String(20), default='unspecified')
    shirt_blouse_size = db.Column(db.String(20), default='unspecified')
    jacket_sweater_size = db.Column(db.String(20), default='unspecified')
    shoe_sock_size = db.Column(db.String(20), default='unspecified')
    shoe_size_category = db.Column(db.String(20), default='unspecified')
    shoe_type = db.Column(db.String(50), default='unspecified')
    sheet_size = db.Column(db.String(20), default='unspecified')
    additional_comments = db.Column(db.Text(), nullable=True, default=None)
    time_chosen = db.Column(db.String(40), nullable=True, default=None)
    sponsor_id = db.Column(db.Integer, db.ForeignKey(
        'user.id'), nullable=True, default=None)
    drive_id = db.Column(db.Integer, db.ForeignKey(
        'holiday_cheer_drive.id'), nullable=False, default=None)

可以通过继承SearchableMixin类使该模型可搜索,如下所示:

That model is made searchable by inheriting from the SearchableMixin class like so:

class SearchableMixin(object):
    @classmethod
    def search_object(cls, fields, expression, page, per_page, fuzziness=0):
        ids, total = query_object(
            cls.__tablename__, fields, expression, page, per_page, fuzziness=fuzziness)
        if total == 0:
            return cls.query.filter_by(id=0), 0
        when = []
        for i in range(len(ids)):
            when.append((ids[i], i))
        return cls.query.filter(cls.id.in_(ids)).order_by(
            db.case(when, value=cls.id)), total

当前搜索时,所有字段都是可搜索的,除非我使用数字值搜索,否则返回有效结果.

When I search it currently, all of the fields are searchable and return a valid result UNLESS I am searching with a numberic value.

这是一个搜索输出示例,当我告诉python将值打印到控制台时,该搜索有效:

Here is an example of output for a search that works when I tell the python to print values to the console:

Query: bob
Body of search:
{'from': 0,
 'query': {'multi_match': {'fields': ['first_name',
                                      'gender',
                                      'wants',
                                      'needs',
                                      'wear',
                                      'read',
                                      'shoe_size_category',
                                      'shoe_type',
                                      'sheet_size',
                                      'additional_comments',
                                      'time_chosen',
                                      'age',
                                      'shoe_sock_size',
                                      'program_number'],
                           'fuzziness': 0,
                           'lenient': True,
                           'query': 'bob'}},
 'size': 10}
Python elasticsearch object:
{'took': 27, 'timed_out': False, '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 2, 'relation': 'eq'}, 'max_score': 1.6916759, 'hits': [{'_index': 'wish_list', '_type': '_doc', '_id': '1', '_score': 1.6916759, '_source': {'first_name': 'bob', 'gender': 'male', 'wants': 'bike', 'needs': 'calculator', 'wear': 'hat', 'read': 'book', 'shoe_size_category': "men's", 'shoe_type': 'sneaker', 'sheet_size': 'unspecified', 'additional_comments': 'Likes cheese', 'time_chosen': None, 'age': '5', 'shoe_sock_size': '4', 'program_number': '215', 'mappings': {'properties': {'first_name': {'type': 'text'}, 'gender': {'type':
'text'}, 'wants': {'type': 'text'}, 'needs': {'type': 'text'}, 'wear': {'type': 'text'}, 'read': {'type': 'text'}, 'shoe_size_category': {'type': 'text'}, 'shoe_type': {'type': 'text'}, 'sheet_size': {'type': 'text'}, 'additional_comments': {'type': 'text'}, 'time_chosen': {'type': 'text'}, 'age': {'type': 'text'}, 'shoe_sock_size': {'type': 'text'}, 'program_number': {'type': 'text'}}}}}, {'_index': 'wish_list', '_type': '_doc', '_id': '9', '_score': 1.6916759, '_source': {'first_name': 'bob', 'gender': 'male', 'wants': 'bike', 'needs': 'calculator', 'wear': 'hat', 'read': 'book', 'shoe_size_category': "men's", 'shoe_type': 'sneaker', 'sheet_size': 'unspecified', 'additional_comments': 'Likes cheese', 'time_chosen': None, 'age': 5, 'shoe_sock_size': 4, 'program_number': 215, 'mappings': {'properties': {'first_name': {'type': 'text'}, 'gender': {'type': 'text'}, 'wants': {'type': 'text'}, 'needs': {'type': 'text'}, 'wear': {'type': 'text'}, 'read': {'type': 'text'}, 'shoe_size_category': {'type': 'text'}, 'shoe_type': {'type': 'text'}, 'sheet_size': {'type': 'text'}, 'additional_comments': {'type': 'text'}, 'time_chosen': {'type': 'text'}, 'age': {'type': 'text'}, 'shoe_sock_size': {'type': 'text'}, 'program_number': {'type': 'text'}}}}}]}}

这是对相同对象的相同查询,但带有数字字符串:

And here's the same exact query on the same exact object, but with a numeric string:

Query: 215
Body of search:
{'from': 0,
 'query': {'multi_match': {'fields': ['first_name',
                                      'gender',
                                      'wants',
                                      'needs',
                                      'wear',
                                      'read',
                                      'shoe_size_category',
                                      'shoe_type',
                                      'sheet_size',
                                      'additional_comments',
                                      'time_chosen',
                                      'age',
                                      'shoe_sock_size',
                                      'program_number'],
                           'fuzziness': 0,
                           'lenient': True,
                           'query': '215'}},
 'size': 10}
Python elasticsearch object:
{'took': 18, 'timed_out': False, '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 0, 'relation': 'eq'}, 'max_score': None, 'hits': []}}

正在将字符串传递到函数中,并且所有数据都保存为字符串,但是似乎存在某种类型错误.在我添加lenient: True之前,它抛出了一个错误,说elasticsearch无法构建查询.

A string is being passed into the function, and the data is all saved as strings, but there seems to be some kind of type error. Before I added in lenient: True, it threw an error saying elasticsearch couldn't build the query.

如果我能理解如何使用Elasticsearch REST API进行操作,那么我大概可以弄清楚如何使用python.

If I can understand how I would do it with the elasticsearch REST API, then I can probably figure out how to do it with python.

推荐答案

之所以出现此问题,是因为在numeric数据类型上使用了fuzziness参数,然后使用lenient true使其起作用了如

The issue is happening due to the use of fuzziness param on the numeric data type and then use of lenient true to make it work by as it removes format-based errors, such as providing a text query value for a numeric field, are ignored. mentioned in this link.

以下是您尝试在数字数据类型上使用fuzziness时遇到的错误.

Below is the error which you get while trying to use fuzziness on numeric data types.

原因":只能在关键字和文本字段上使用模糊查询-不能 [年龄]类型为[整数]"

reason": "Can only use fuzzy queries on keyword and text fields - not on [age] which is of type [integer]"

当您添加"lenient" : true时,上述错误消失了,但没有返回任何文档.

And when you add "lenient" : true, then the above error goes but doesn't return any document.

要使其生效,只需从搜索查询中删除fuzzinesslenient参数,它就可以工作,因为Elasticsearch会自动将有效的string转换为numeric,反之亦然,如强制性文章.

To make it work, simply remove fuzziness and lenient param from your search query and it should work, as Elasticsearch automatically converts valid string to numeric and vice versa as explained in coerce article.

{
    "mappings": {
        "properties": {
            "age" :{
                "type" : "integer"
            }
        }
    }
}

索引样本文档

{
  "age" : "25" --> note use of `""`, sending it as string
}

{
  "age" : 28 :- note sending numneric value
}

字符串格式的搜索查询

{
    "query": {
        "bool": {
            "must": [
                {
                    "multi_match": {
                        "query": "28", --> note string format
                        "fields": [
                            "age" --> note you can add more fields
                        ]
                    }
                }
            ]
        }
    }
}

搜索结果

"hits": [
      {
        "_index": "so_numberic",
        "_type": "_doc",
        "_id": "1",
        "_score": 1.0,
        "_source": {
          "program_number": "123456789",
          "age": "28"
        }
      }
    ]

数字格式的搜索查询

{
    "query": {
        "match" : { --> query on single field.
            "age" : {
                "query" : 28 --> note numeric format
            }
        }
    }
}

结果

"hits": [
      {
        "_index": "so_numberic",
        "_type": "_doc",
        "_id": "1",
        "_score": 1.0,
        "_source": {
          "program_number": "123456789",
          "age": "28"
        }
      }
    ]

显示fuzzinesslenient不会产生任何结果,如前所述.

Showing your fuzziness and lenient doesn't bring any result as explained earlier.

{
    "query": {
        "match": {
            "age": {
                "query": 28,
                "fuzziness": 2,
                "lenient": true
            }
        }
    }
}

结果

{
    "took": 1,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": { --> note 0 results.
        "total": {
            "value": 0,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    }
}

这篇关于如何告诉ElasticSearch多重匹配查询我希望数字字段(以字符串形式存储)返回与数字字符串匹配的内容?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆