使用nltk和wordnet对复数名词进行定理 [英] lemmatize plural nouns using nltk and wordnet

查看:188
本文介绍了使用nltk和wordnet对复数名词进行定理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想用来做词形化

from nltk import word_tokenize, sent_tokenize, pos_tag
from nltk.stem.wordnet import WordNetLemmatizer
from nltk.corpus import wordnet
lmtzr = WordNetLemmatizer()
POS = pos_tag(text)

def get_wordnet_pos(treebank_tag):
        #maps pos tag so lemmatizer understands
        from nltk.corpus import wordnet
        if treebank_tag.startswith('J'):
            return wordnet.ADJ
        elif treebank_tag.startswith('V'):
            return wordnet.VERB
        elif treebank_tag.startswith('N'):
            return wordnet.NOUN
        elif treebank_tag.startswith('R'):
            return wordnet.ADV
        else:
            return wordnet.NOUN
 lmtzr.lemmatize(text[i], get_wordnet_pos(POS[i][1]))

问题是POS标记器将"procaspases"设为"NNS",但是我如何将NNS转换为wordnet,因为即使在进行了词法分解后,"procaspases"仍然是"procaspaseS".

The issue is that the POS tagger gets that "procaspases" is 'NNS', but how do I convert NNS to wordnet, since as is "procaspases" continues to be "procaspaseS" even after the lemmatizer.

推荐答案

NLTK会处理大多数复数形式,而不仅仅是删除结尾的's'.

NLTK takes care of most plurals, not just by deleting an ending 's.'

import nltk
from nltk.stem.wordnet import WordNetLemmatizer

Lem = WordNetLemmatizer()

phrase = 'cobblers ants women boys needs finds binaries hobbies busses wolves'

words = phrase.split()
for word in words :
  lemword = Lem.lemmatize(word)
  print(lemword)

输出: 补鞋匠蚂蚁女人男孩需要找到二进制爱好巴士狼

Output: cobbler ant woman boy need find binary hobby bus wolf

这篇关于使用nltk和wordnet对复数名词进行定理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆