分析器自动完成名称 [英] Analyzer to autocomplete names

查看:134
本文介绍了分析器自动完成名称的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想要自动填写名称。



例如,如果我们的名字为 John Smith 我想要搜索 Jo Sm John Sm 以获取文档。



此外,我不希望与文档匹配 jo sm



我目前拥有此分析器:

  return array(
'settings'=>数组(
'index'=>数组(
'analysis'=>数组(
'analyzer'=>数组(
'autocomplete'=>数组(
'tokenizer'=>'autocompleteEngram',
'filter'=>数组('小写','空格')

),

'tokenizer'=>数组(
'autocompleteEngram'=>数组(
'type'=>'edgeNGram',
'min_gram'=> 1,
'max_gram'=> 50





);

这个问题是我们首先将文本分开,然后使用edgengrams进行标记化。



这导致:
j jo code> joh john s sm smi smit smith



这意味着,如果我搜索 john smith john sm 所以我需要生成如下所示的令牌:
j jo joh john s sm smi smit smith john s john sm john smi john smit john smith



如何设置分析仪,以便生成额外的令牌?

解决方案

我最终没有使用edgengrams。



我创建了一个分析器标准标记器和标准小写过滤器。这实际上与标准分析仪完全相同,但没有任何停用词过滤器(我们正在搜索名称,可能有人称为 An 等)。



然后我将上述分析仪设置为 index_analyzer 简单作为 search_analyzer 。使用此设置与 match_phrase_prefix 查询工作非常好。



这是我使用的自定义分析器(称为自动填充和用PHP表示):

 'autocomplete'=>数组(
'tokenizer'=>'standard',
'filter'=> array('standard','lowercase')
),


I want to be able autocomplete names.

For example, if we have the name John Smith, I want to be able to search for Jo and Sm and John Sm to get the document back.

In addition, I do not want jo sm matching the document.

I currently have this analyzer:

return array(
    'settings' => array(
        'index' => array(
            'analysis' => array(
                'analyzer' => array(
                    'autocomplete' => array(
                        'tokenizer' => 'autocompleteEngram',
                        'filter' => array('lowercase', 'whitespace')
                    )
                ),

                'tokenizer' => array(
                    'autocompleteEngram' => array(
                        'type' => 'edgeNGram',
                        'min_gram' => 1,
                        'max_gram' => 50
                    )
                )
            )   
        )
    )
);

The problem with this is that first we split the text up and then tokenize using edgengrams.

This results in this: j jo joh john s sm smi smit smith

This means, if I search for john smith or john sm, nothing would be returned.

So, I need to be generate tokens that look like this: j jo joh john s sm smi smit smith john s john sm john smi john smit john smith.

How can I set up my analyzer so that I generates those extra tokens?

解决方案

I ended up not using edgengrams.

I created an analyzer with the standard tokenizer, and standard and lowercase filters. This is virtually identical to the standard analyser, but does not have any stopwords filter (we are searching for names after all, and there might be someone called The or An etc).

I then set the above analyzer as the index_analyzer and simple as the search_analyzer. Using this setup with a match_phrase_prefix query worked really well.

This is the custom analyser I used (called autocomplete and expressed in PHP):

'autocomplete' => array(
                        'tokenizer' => 'standard',
                        'filter' => array('standard', 'lowercase')
                ),

这篇关于分析器自动完成名称的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆