NLTK词lemmatizing [英] NLTK words lemmatizing

查看:199
本文介绍了NLTK词lemmatizing的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试对使用NLTK的单词进行词素化.

I am trying to do lemmatization on words with NLTK.

我现在可以找到的是,我可以使用stem包来获得一些结果,例如将"cars"转换为"car",将"women"转换为"woman",但是我不能对某些带有词缀的词进行词形化.像致谢".

What I can find now is that I can use the stem package to get some results like transform "cars" to "car" and "women" to "woman", however I cannot do lemmatization on some words with affixes like "acknowledgement".

在确认"上使用WordNetLemmatizer()时,它返回确认",而在使用.PorterStemmer()时,它返回确认"而不是确认".

When using WordNetLemmatizer() on "acknowledgement", it returns "acknowledgement" and using .PorterStemmer(), it returns "acknowledg" rather than "acknowledge".

谁能告诉我如何消除单词的词缀?
说,当输入为确认"时,输出为确认"

Can anyone tell me how to eliminate the affixes of words?
Say, when input is "acknowledgement", the output to be "acknowledge"

推荐答案

合法化不会(也不应该)为确认"返回确认".前者是动词,而后者是名词.另一方面,Porter的词干算法仅使用一组固定的规则.因此,您唯一的方法是从源头更改规则. (不是解决问题的正确方法.)

Lemmatization does not (and should not) return "acknowledge" for "acknowledgement". The former is a verb, while the latter is a noun. Porter's stemming algorithm, on the other hand, simply uses a fixed set of rules. So, your only way there is to change the rules at source. (NOT the right way to fix your problem).

您正在寻找的是确认"的衍生形式,为此,您最好的来源是WordNet.您可以检查

What you are looking for is the derivationally related form of "acknowledgement", and for this, your best source is WordNet. You can check this online on WordNet.

您可以使用许多基于WordNet的库(例如,在 JWNL 中)在Java中).在Python中,NLTK应该能够获得您在网上看到的与派生相关的形式:

There are quite a few WordNet-based libraries that you can use for this (e.g. in JWNL in Java). In Python, NLTK should be able to get the derivationally related form you saw online:

from nltk.corpus import wordnet as wn

acknowledgment_synset = wn.synset('acknowledgement.n.01')
acknowledgment_lemma = acknowledgment_synset.lemmas[1]

print(acknowledgment_lemma.derivationally_related_forms())
# [Lemma('admit.v.01.acknowledge'), Lemma('acknowledge.v.06.acknowledge')]

这篇关于NLTK词lemmatizing的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆