NLTK WordNet Lemmatizer:它不应该对一个词的所有变形进行词形还原吗? [英] NLTK WordNet Lemmatizer: Shouldn't it lemmatize all inflections of a word?

查看:33
本文介绍了NLTK WordNet Lemmatizer:它不应该对一个词的所有变形进行词形还原吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将 NLTK WordNet Lemmatizer 用于词性标注项目,首先将训练语料库中的每个单词修改为其词干(就地修改),然后仅在新语料库上进行训练.但是,我发现 lemmatizer 并没有像我预期的那样运行.

例如,词 loves 词形还原为 love 是正确的,但词 loving 仍然是 loving> 即使在词形还原之后.这里的loving就像我爱它"这句话.

love 不是屈折词love 的词干吗?同样,许多其他ing"形式在词形还原后保持原样.这是正确的行为吗?

还有哪些其他准确的词形还原法?(不必在 NLTK 中)在确定词干时,是否有词法分析器或词形还原器也考虑了词的词性标签?例如,如果 killing 用作动词,killing 这个词应该有 kill 作为词干,但它应该有 killing 作为词干,如果它被用作名词(如在 the kill was done by xyz 中一样).

解决方案

WordNet lemmatizer 确实考虑了 POS 标签,但它并没有神奇地确定它:

<预><代码>>>>nltk.stem.WordNetLemmatizer().lemmatize('love')'爱'>>>nltk.stem.WordNetLemmatizer().lemmatize('love', 'v')你爱'

如果没有 POS 标签,它会假设您输入的所有内容都是名词.所以在这里它认为你在传递名词爱"(如甜蜜的爱").

I'm using the NLTK WordNet Lemmatizer for a Part-of-Speech tagging project by first modifying each word in the training corpus to its stem (in place modification), and then training only on the new corpus. However, I found that the lemmatizer is not functioning as I expected it to.

For example, the word loves is lemmatized to love which is correct, but the word loving remains loving even after lemmatization. Here loving is as in the sentence "I'm loving it".

Isn't love the stem of the inflected word loving? Similarly, many other 'ing' forms remain as they are after lemmatization. Is this the correct behavior?

What are some other lemmatizers that are accurate? (need not be in NLTK) Are there morphology analyzers or lemmatizers that also take into account a word's Part Of Speech tag, in deciding the word stem? For example, the word killing should have kill as the stem if killing is used as a verb, but it should have killing as the stem if it is used as a noun (as in the killing was done by xyz).

解决方案

The WordNet lemmatizer does take the POS tag into account, but it doesn't magically determine it:

>>> nltk.stem.WordNetLemmatizer().lemmatize('loving')
'loving'
>>> nltk.stem.WordNetLemmatizer().lemmatize('loving', 'v')
u'love'

Without a POS tag, it assumes everything you feed it is a noun. So here it thinks you're passing it the noun "loving" (as in "sweet loving").

这篇关于NLTK WordNet Lemmatizer:它不应该对一个词的所有变形进行词形还原吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆