NLTK WordNet Lemmatizer:难道它不能使单词的所有词形变化吗? [英] NLTK WordNet Lemmatizer: Shouldn't it lemmatize all inflections of a word?

查看:1553
本文介绍了NLTK WordNet Lemmatizer:难道它不能使单词的所有词形变化吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将NLTK WordNet Lemmatizer用于词性标记项目,方法是首先将训练语料库中的每个单词修改为其词干(就地修改),然后仅对新语料库进行训练.但是,我发现lemmatizer不能按我预期的那样工作.

I'm using the NLTK WordNet Lemmatizer for a Part-of-Speech tagging project by first modifying each word in the training corpus to its stem (in place modification), and then training only on the new corpus. However, I found that the lemmatizer is not functioning as I expected it to.

例如,单词loves被词形化为love,这是正确的,但是即使词形化后,词loving仍保留为loving.这里的loving就像句子我爱它"一样.

For example, the word loves is lemmatized to love which is correct, but the word loving remains loving even after lemmatization. Here loving is as in the sentence "I'm loving it".

love不是变体词loving的词干吗?类似地,在残词化之后,许多其他"ing"形式仍然保留.这是正确的行为吗?

Isn't love the stem of the inflected word loving? Similarly, many other 'ing' forms remain as they are after lemmatization. Is this the correct behavior?

还有哪些其他准确的词条修饰符? (不需要在NLTK中)在确定单词词干时,是否有形态分析仪或词义分析程序也考虑到单词的词性标签?例如,如果将killing用作动词,则单词killing应该以kill作为词干,但是如果将其用作名词,则它应该以killing作为词干(如the killing was done by xyz)

What are some other lemmatizers that are accurate? (need not be in NLTK) Are there morphology analyzers or lemmatizers that also take into account a word's Part Of Speech tag, in deciding the word stem? For example, the word killing should have kill as the stem if killing is used as a verb, but it should have killing as the stem if it is used as a noun (as in the killing was done by xyz).

推荐答案

WordNet lemmatizer 确实考虑了POS标记,但并不能神奇地确定它:

The WordNet lemmatizer does take the POS tag into account, but it doesn't magically determine it:

>>> nltk.stem.WordNetLemmatizer().lemmatize('loving')
'loving'
>>> nltk.stem.WordNetLemmatizer().lemmatize('loving', 'v')
u'love'

没有POS标签,它假定您提供的所有内容都是一个名词.因此,在这里它认为您正在为它传递名词"loving"(如甜蜜的爱恋").

Without a POS tag, it assumes everything you feed it is a noun. So here it thinks you're passing it the noun "loving" (as in "sweet loving").

这篇关于NLTK WordNet Lemmatizer:难道它不能使单词的所有词形变化吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆