如何阻止 NLTK 词干删除尾随的“e"? [英] How to stop NLTK stemmer from removing the trailing "e"?

查看：36 发布时间：2021/6/7 20:37:24 python nlp nltk

本文介绍了如何阻止 NLTK 词干删除尾随的“e"?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用 NLTK 词干分析器来删除词干词的语法变体.但是，Port 或 Snowball 词干分析器会删除名词或动词原始形式的尾随e"，例如，Profile 变为 Profil.

I'm using NLTK stemmer to remove grammatical variations of a stem word. However, the Port or Snowball stemmers remove the trailing "e" of the original form of a noun or verb, e.g., Profile becomes Profil.

我怎样才能防止这种情况发生?我知道我可以使用条件来防止这种情况.但显然它会在不同情况下失败.

How can I prevent this from happening? I know I can use a conditional to guard against this. But obviously it will fail on different cases.

是否有我想要的选项或其他 API?

Is there an option or another API for what I want?

推荐答案

我同意 Philip 的观点，即词干分析器的目标是只保留词干.对于这种特殊情况，您可以尝试使用 lemmatizer 而不是词干提取器，它应该会保留更多的单词，并且旨在删除完全不同形式的单词，例如profiles"->profile".NLTK 中有一个用于此的类 - 尝试使用 nltk.stem 中的 WordNetLemmatizer().

I agree with Philip that the goal of stemmer is to retain only the stem. For this particular case you can try a lemmatizer instead of stemmer which will supposedly retain more of a word and is meant to remove exactly different forms of a word like 'profiles' --> 'profile'. There is a class in NLTK for this - try WordNetLemmatizer() from nltk.stem.

请注意，它仍然不完美(在处理文本时就像没有任何东西一样)，因为我曾经从 'physics' 中得到 'physic'.

Beware that it's still not perfect (like nothing when working with text) because I used to get 'physic' from 'physics'.

这篇关于如何阻止 NLTK 词干删除尾随的“e"?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何阻止 NLTK 词干删除尾随的“e"? [英] How to stop NLTK stemmer from removing the trailing "e"?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何阻止 NLTK 词干删除尾随的“e"? [英] How to stop NLTK stemmer from removing the trailing &quot;e&quot;?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

如何阻止 NLTK 词干删除尾随的“e"? [英] How to stop NLTK stemmer from removing the trailing "e"?

登录关闭