Python 词干分析器问题:词干错误 [英] Python stemmer issue: wrong stem

查看：34 发布时间：2021/9/28 18:30:26 python words porter-stemmer

本文介绍了Python 词干分析器问题:词干错误的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用 python 词干分析器来提取词干，我尝试过 Porter 和 Lancaster，但它们有同样的问题.他们无法阻止以er"或e"结尾的正确词.

Hi i'm trying to stem words with a python stemmer, i tried Porter and Lancaster, but they have the same problem. They can't stem correclty words that end with "er" or "e".

例如，它们干

computer -->  comput

rotate   -->  rotat

这是代码的一部分

line=line.lower()
line=re.sub(r'[^a-z0-9 ]',' ',line)
line=line.split()
line=[x for x in line if x not in stops]
line=[ porter.stem(word, 0, len(word)-1) for word in line]
# or 'line=[ st.stem(word) for word in line]'
return line

有解决这个问题的想法吗?

any idea to fix this problem?

推荐答案

引用维基百科页面, 在计算语言学中，词干是词的一部分，即使在形态变化时也不会改变，而引理是词的基本形式.例如，给定生产"这个词，它的引理(语言学)是生产"，而词干是生产":这是因为有生产这样的词.所以你的代码很可能会给你正确的结果.您似乎期望引理不是词干生成器产生的(除非引理恰好等于词干)

To quote the page on Wikipedia, In computational linguistics, a stem is the part of the word that never changes even when morphologically inflected, whilst a lemma is the base form of the word. For example, given the word "produced", its lemma (linguistics) is "produce", however the stem is "produc": this is because there are words such as production. So your code is likely giving you correct results. You seem to expect a lemma which is not what a stemmer produces (except when the lemma happens to equal the stem)

这篇关于Python 词干分析器问题:词干错误的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Python 词干分析器问题:词干错误 [英] Python stemmer issue: wrong stem

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python 词干分析器问题:词干错误 [英] Python stemmer issue: wrong stem

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭