为什么我的porter stemmer算法的结果没有按照应该的词根呢? [英] why the results of the porter stemmer algorithm that I have not in accordance with the root word that should be?

查看:19
本文介绍了为什么我的porter stemmer算法的结果没有按照应该的词根呢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要使用 porter stemmer 算法来获取应用程序中的词干词,但是当我测试从 http://www.tartarus.org/~martin/PorterStemmer,词干提取的结果没有给我正确的词干,例如:快乐 --> 快乐病毒 --> 病毒等等你能帮我解决吗?

i need to use porter stemmer algorithm to get stem word in my application,but when i test the algorithm which i get from http://www.tartarus.org/~martin/PorterStemmer, the result of stemming isn't give me correct stem word, eg : happy --> happi virus --> viru etc can you help me to solve it?

推荐答案

引用您的 link:

词干提取算法在去除词干后没有留下真正的单词,这通常被认为是一个粗略的错误.但是词干提取的目的是将一个词的变体形式组合在一起,而不是将一个词映射到它的范式"形式上.

2. Why is the stemmer not producing proper words?

It is often taken to be a crude error that a stemming algorithm does not leave a real word after removing the stem. But the purpose of stemming is to bring variant forms of a word together, not to map a word onto its ‘paradigm’ form.

与此相关,

问题通常以这样的形式出现,为什么单词 X 应该被提取到 x1,而人们会期望它被提取到 x2?重要的是要记住词干提取算法无法达到完美.总的来说,它会(或可能)提高 IR 性能,但在个别情况下,它有时可能会导致错误或看起来错误.当然,这与建议可能包含在词干分析器中以提高其性能的附加规则不同.

The question normally comes in the form, why should word X be stemmed to x1, when one would have expected it to be stemmed to x2? It is important to remember that the stemming algorithm cannot achieve perfection. On balance it will (or may) improve IR performance, but in individual cases it may sometimes make what are, or what seem to be, errors. Of course, this is a different matter from suggesting an additional rule that might be included in the stemmer to improve its performance.

这篇关于为什么我的porter stemmer算法的结果没有按照应该的词根呢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆