使用 NLTK 对 POS 标记词进行词形还原? [英] Lemmatizing POS tagged words with NLTK?

查看:25
本文介绍了使用 NLTK 对 POS 标记词进行词形还原?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我用 nltk.pos_tag() POS 标记了一些词,所以它们被赋予了树库标签.我想使用已知的 POS 标签对这些词进行词形还原,但我不确定如何.我在看 Wordnet lemmatizer,但我不确定如何将 treebank POS 标签转换为 lemmatizer 接受的标签.如何简单地执行此转换,或者是否有使用 treebank 标签的 lemmatizer?

I have POS tagged some words with nltk.pos_tag(), so they are given treebank tags. I would like to lemmatize these words using the known POS tags, but I am not sure how. I was looking at Wordnet lemmatizer, but I am not sure how to convert the treebank POS tags to tags accepted by the lemmatizer. How can I perform this conversion simply, or is there a lemmatizer that uses treebank tags?

推荐答案

wordnet lemmatizer 只知道四个词类(ADJ、ADV、NOUN 和 VERB)并且只有 NOUN 和 VERB 规则可以做任何特别有趣的事情.树库标签集中的名词词性均以NN开头,动词标签均以VB开头,形容词标签以JJ开头,副词标签以RB开头.因此,从一组标签转换为另一组标签非常容易,例如:

The wordnet lemmatizer only knows four parts of speech (ADJ, ADV, NOUN, and VERB) and only the NOUN and VERB rules do anything especially interesting. The noun parts of speech in the treebank tagset all start with NN, the verb tags all start with VB, the adjective tags start with JJ, and the adverb tags start with RB. So, converting from one set of labels to the other is pretty easy, something like:

from nltk.corpus import wordnet

morphy_tag = {'NN':wordnet.NOUN,'JJ':wordnet.ADJ,'VB':wordnet.VERB,'RB':wordnet.ADV}[penn_tag[:2]]

这篇关于使用 NLTK 对 POS 标记词进行词形还原?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆