分析器自动完成名称 [英] Analyzer to autocomplete names
问题描述
我想要自动填写名称。
例如,如果我们的名字为 John Smith
我想要搜索 Jo
和 Sm
和 John Sm
以获取文档。
此外,我不希望与文档匹配 jo sm
。
我目前拥有此分析器:
return array(
'settings'=>数组(
'index'=>数组(
'analysis'=>数组(
'analyzer'=>数组(
'autocomplete'=>数组(
'tokenizer'=>'autocompleteEngram',
'filter'=>数组('小写','空格')
)
),
'tokenizer'=>数组(
'autocompleteEngram'=>数组(
'type'=>'edgeNGram',
'min_gram'=> 1,
'max_gram'=> 50
)
)
)
)
)
);
这个问题是我们首先将文本分开,然后使用edgengrams进行标记化。
这导致:
j
jo
code> joh john
s
sm
smi
smit
smith
这意味着,如果我搜索 john smith
或 john sm
所以我需要生成如下所示的令牌:
j
jo
joh
john
s
sm
smi
smit
smith
john s
john sm
john smi
john smit
john smith
。
如何设置分析仪,以便生成额外的令牌?
我最终没有使用edgengrams。
我创建了一个分析器标准
标记器和标准
和小写
过滤器。这实际上与标准
分析仪完全相同,但没有任何停用词过滤器(我们正在搜索名称,可能有人称为
或 An
等)。
然后我将上述分析仪设置为 index_analyzer
和简单
作为 search_analyzer
。使用此设置与 match_phrase_prefix
查询工作非常好。
这是我使用的自定义分析器(称为自动填充和用PHP表示):
'autocomplete'=>数组(
'tokenizer'=>'standard',
'filter'=> array('standard','lowercase')
),
I want to be able autocomplete names.
For example, if we have the name John Smith
, I want to be able to search for Jo
and Sm
and John Sm
to get the document back.
In addition, I do not want jo sm
matching the document.
I currently have this analyzer:
return array(
'settings' => array(
'index' => array(
'analysis' => array(
'analyzer' => array(
'autocomplete' => array(
'tokenizer' => 'autocompleteEngram',
'filter' => array('lowercase', 'whitespace')
)
),
'tokenizer' => array(
'autocompleteEngram' => array(
'type' => 'edgeNGram',
'min_gram' => 1,
'max_gram' => 50
)
)
)
)
)
);
The problem with this is that first we split the text up and then tokenize using edgengrams.
This results in this:
j
jo
joh
john
s
sm
smi
smit
smith
This means, if I search for john smith
or john sm
, nothing would be returned.
So, I need to be generate tokens that look like this:
j
jo
joh
john
s
sm
smi
smit
smith
john s
john sm
john smi
john smit
john smith
.
How can I set up my analyzer so that I generates those extra tokens?
I ended up not using edgengrams.
I created an analyzer with the standard
tokenizer, and standard
and lowercase
filters. This is virtually identical to the standard
analyser, but does not have any stopwords filter (we are searching for names after all, and there might be someone called The
or An
etc).
I then set the above analyzer as the index_analyzer
and simple
as the search_analyzer
. Using this setup with a match_phrase_prefix
query worked really well.
This is the custom analyser I used (called autocomplete and expressed in PHP):
'autocomplete' => array(
'tokenizer' => 'standard',
'filter' => array('standard', 'lowercase')
),
这篇关于分析器自动完成名称的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!