NLTK WordNet Lemmatizer:它不应该对一个词的所有变形进行词形还原吗? [英] NLTK WordNet Lemmatizer: Shouldn't it lemmatize all inflections of a word?
问题描述
我将 NLTK WordNet Lemmatizer 用于词性标注项目,首先将训练语料库中的每个单词修改为其词干(就地修改),然后仅在新语料库上进行训练.但是,我发现 lemmatizer 并没有像我预期的那样运行.
例如,词 loves
词形还原为 love
是正确的,但词 loving
仍然是 loving
> 即使在词形还原之后.这里的loving
就像我爱它"这句话.
love
不是屈折词love
的词干吗?同样,许多其他ing"形式在词形还原后保持原样.这是正确的行为吗?
还有哪些其他准确的词形还原法?(不必在 NLTK 中)在确定词干时,是否有词法分析器或词形还原器也考虑了词的词性标签?例如,如果 killing
用作动词,killing
这个词应该有 kill
作为词干,但它应该有 killing
作为词干,如果它被用作名词(如在 the kill was done by xyz
中一样).
WordNet lemmatizer 确实考虑了 POS 标签,但它并没有神奇地确定它:
<预><代码>>>>nltk.stem.WordNetLemmatizer().lemmatize('love')'爱'>>>nltk.stem.WordNetLemmatizer().lemmatize('love', 'v')你爱'如果没有 POS 标签,它会假设您输入的所有内容都是名词.所以在这里它认为你在传递名词爱"(如甜蜜的爱").
I'm using the NLTK WordNet Lemmatizer for a Part-of-Speech tagging project by first modifying each word in the training corpus to its stem (in place modification), and then training only on the new corpus. However, I found that the lemmatizer is not functioning as I expected it to.
For example, the word loves
is lemmatized to love
which is correct, but the word loving
remains loving
even after lemmatization. Here loving
is as in the sentence "I'm loving it".
Isn't love
the stem of the inflected word loving
? Similarly, many other 'ing' forms remain as they are after lemmatization. Is this the correct behavior?
What are some other lemmatizers that are accurate? (need not be in NLTK) Are there morphology analyzers or lemmatizers that also take into account a word's Part Of Speech tag, in deciding the word stem? For example, the word killing
should have kill
as the stem if killing
is used as a verb, but it should have killing
as the stem if it is used as a noun (as in the killing was done by xyz
).
The WordNet lemmatizer does take the POS tag into account, but it doesn't magically determine it:
>>> nltk.stem.WordNetLemmatizer().lemmatize('loving')
'loving'
>>> nltk.stem.WordNetLemmatizer().lemmatize('loving', 'v')
u'love'
Without a POS tag, it assumes everything you feed it is a noun. So here it thinks you're passing it the noun "loving" (as in "sweet loving").
这篇关于NLTK WordNet Lemmatizer:它不应该对一个词的所有变形进行词形还原吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!