用NLTK使POS标签的单词合法化? [英] Lemmatizing POS tagged words with NLTK?

查看:102
本文介绍了用NLTK使POS标签的单词合法化?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经用nltk.pos_tag()为POS标记了一些单词,因此为它们提供了树库标记.我想使用已知的POS标签对这些词进行词法化,但是我不确定如何做.我正在查看Wordnet lemmatizer,但不确定如何将treebank POS标签转换为lemmatizer接受的标签.我该如何简单地执行此转换,或者有使用树库标签的lemmatizer?

I have POS tagged some words with nltk.pos_tag(), so they are given treebank tags. I would like to lemmatize these words using the known POS tags, but I am not sure how. I was looking at Wordnet lemmatizer, but I am not sure how to convert the treebank POS tags to tags accepted by the lemmatizer. How can I perform this conversion simply, or is there a lemmatizer that uses treebank tags?

推荐答案

wordnet lemmatizer只知道语音的四个部分(ADJ,ADV,NOUN和VERB),只有NOUN和VERB规则可以做特别有趣的事情.树库标签集中的名词性词性均以NN开头,动词标签均以VB开头,形容词标签以JJ开头,副词标签以RB开头.因此,从一组标签转换为另一组标签非常容易,就像这样:

The wordnet lemmatizer only knows four parts of speech (ADJ, ADV, NOUN, and VERB) and only the NOUN and VERB rules do anything especially interesting. The noun parts of speech in the treebank tagset all start with NN, the verb tags all start with VB, the adjective tags start with JJ, and the adverb tags start with RB. So, converting from one set of labels to the other is pretty easy, something like:

from nltk.corpus import wordnet

morphy_tag = {'NN':wordnet.NOUN,'JJ':wordnet.ADJ,'VB':wordnet.VERB,'RB':wordnet.ADV}[penn_tag[:2]]

这篇关于用NLTK使POS标签的单词合法化?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆