弹性搜索:搜索部分单词 [英] elasticsearch: search for parts of words

查看:76
本文介绍了弹性搜索:搜索部分单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试学习如何使用elasticsearch(使用elasticsearch-php查询)。我已插入一些数据,看起来像这样:

I'm trying to learn how to use elasticsearch (using elasticsearch-php for queries). I have inserted a few data, which look something like this:

['id' => 1, 'name' => 'butter', 'category' => 'food'], 
['id' => 2,'name' => 'buttercup', 'category' => 'food'],
['id' => 3,'name' => 'something else', 'category' => 'butter'] 

现在我创建了一个如下所示的搜索查询:

Now I created a search query which looks like this:

$query = [
    'filtered' => [
        'query' => [
            'bool' => [
                'should' => [
                    ['match' => [
                        'name' => [
                            'query' => $val,
                            'boost' => 7
                        ]
                    ]],
                    ['match' => [
                        'category' => [
                            'query' => $val,
                            'boost' => 5
                        ]
                    ]],
                ],
            ]
        ]
    ]
];

其中$ val是搜索字词。这个工作很好,我唯一的问题是:当我搜索黄油时,我发现ids 1和3,但不是2,因为searchterm似乎只匹配确切的单词。有没有办法搜索在单词之内,或以mysql的方式来做一些像WHERE名称LIKE'%val%'的内容?

where $val is the search term. This works nicely, the only problem I have: when I search for "butter", I find ids 1 and 3, but not 2, because the searchterm seems to match exact words only. Is there a way to search "within words", or, in mysql terms, to do something like WHERE name LIKE '%val%' ?

推荐答案

您可以尝试 通配符 查询

You can try the wildcard query

$query = [
    'filtered' => [
        'query' => [
            'bool' => [
                'should' => [
                    ['wildcard' => [
                        'name' => [
                            'query' => '*'.$val.'*',
                            'boost' => 7
                        ]
                    ]],
                    ['wildcard' => [
                        'category' => [
                            'query' => '*'.$val.'*',
                            'boost' => 5
                        ]
                    ]],
                ],
            ]
        ]
    ]
];

query_string 查询。

$query = [
    'filtered' => [
        'query' => [
            'bool' => [
                'should' => [
                    ['query_string' => [
                        'default_field' => 'name',
                        'query' => '*'.$val.'*',
                        'boost' => 7
                    ]],
                    ['query_string' => [
                        'default_field' => 'category',
                        'query' => '*'.$val.'*',
                        'boost' => 7
                    ]],
                ],
            ]
        ]
    ]
];

如果您有大量数据,两者都可以正常工作。

Both will work but are not really performant if you have lots of data.

正确的方法是使用自定义分析器,使用标准的标记器和 ngram令牌过滤器,以便将每个令牌切成小块并将其骰子切成小块。

The correct way of doing this is to use a custom analyzer with a standard tokenizer and an ngram token filter in order to slice and dice each of your tokens into small ones.

这篇关于弹性搜索:搜索部分单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆