防止在PostgreSQL中阻止专有名词的产生? [英] Prevent stemming of proper nouns in PostgreSQL?

查看:51
本文介绍了防止在PostgreSQL中阻止专有名词的产生?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

热情地将将令牌插入词位, PostgreSQL全文搜索引擎还减少了专有名词。例如:

In its enthusiasm to stemm tokens into lexemes, PostgreSQL Full Text Search engine also reduce proper nouns. For instance:

essais=> select to_tsquery('english', 'bortzmeyer');
to_tsquery 
------------
'bortzmey'

essais=> select to_tsquery('english', 'balling');
to_tsquery 
------------
'ball'
(1 row)

至少对于第一个,我确定它不在英语词典中!避免这种虚假阻止的更好方法是什么?

At least for the first one, I'm sure it is not in the english dictionary! What is the better way to avoid this spurious stemming?

推荐答案

阻止算法的重点是 not 减少每个单词的词根;目的是将相似的词减少为常见的词干形式。目标通常是不让用户看到一个单词:即使'balling'和'ball'都产生'kjebnkkekaa',该算法也是正确的,因为它仍然认为'balling'和'ball'通常与

The point of stemming algorithms is not to reduce every word to its proper stem; the goal is to reduce words that are alike to a common stemmed form. The goal is generally not to get a word that can be presented to the user: even if 'balling' and 'ball' would both produce 'kjebnkkekaa' the algorithm is correct because it still sees 'balling' and 'ball' as generally concerning the same thing.

还要注意,没有阻止算法绝对是完美的,有关更多信息,请查找Porter Stemming算法

Also beware that no stemming algorithm is absolutely perfect, for more info look up the Porter Stemming algorithm

这篇关于防止在PostgreSQL中阻止专有名词的产生?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆