python中的wordnet词法化和pos标记 [英] wordnet lemmatization and pos tagging in python

查看：74 发布时间：2020/5/18 1:10:45 python nltk wordnet lemmatization

本文介绍了python中的wordnet词法化和pos标记的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想在python中使用wordnet lemmatizer，并且我了解到默认pos标签为NOUN，并且除非为pos标签明确指定为VERB，否则它不会为动词输出正确的引理.

I wanted to use wordnet lemmatizer in python and I have learnt that the default pos tag is NOUN and that it does not output the correct lemma for a verb, unless the pos tag is explicitly specified as VERB.

我的问题是，为了准确地进行上述lemmatization，什么是最好的镜头?

My question is what is the best shot inorder to perform the above lemmatization accurately?

我使用nltk.pos_tag进行了pos标记，但我迷失了将树库pos标记集成到wordnet兼容的pos标记中的想法.请帮助

I did the pos tagging using nltk.pos_tag and I am lost in integrating the tree bank pos tags to wordnet compatible pos tags. Please help

from nltk.stem.wordnet import WordNetLemmatizer
lmtzr = WordNetLemmatizer()
tagged = nltk.pos_tag(tokens)

我得到NN，JJ，VB，RB中的输出标签.如何将它们更改为与Wordnet兼容的标签?

I get the output tags in NN,JJ,VB,RB. How do I change these to wordnet compatible tags?

我还必须训练带有标记语料库的nltk.pos_tag()还是可以直接在数据中使用它来进行评估?

Also do I have to train nltk.pos_tag() with a tagged corpus or can I use it directly on my data to evaluate?

推荐答案

首先，您可以直接使用nltk.pos_tag()而不进行培训. 该函数将从文件中加载预训练的标记器.您可以看到文件名与nltk.tag._POS_TAGGER:

First of all, you can use nltk.pos_tag() directly without training it. The function will load a pretrained tagger from a file. You can see the file name with nltk.tag._POS_TAGGER:

nltk.tag._POS_TAGGER
>>> 'taggers/maxent_treebank_pos_tagger/english.pickle'

在接受Treebank语料库的培训后，它还使用了 Treebank标记集.

As it was trained with the Treebank corpus, it also uses the Treebank tag set.

以下功能会将树库标签映射到语音名称的WordNet部分:

The following function would map the treebank tags to WordNet part of speech names:

from nltk.corpus import wordnet

def get_wordnet_pos(treebank_tag):

    if treebank_tag.startswith('J'):
        return wordnet.ADJ
    elif treebank_tag.startswith('V'):
        return wordnet.VERB
    elif treebank_tag.startswith('N'):
        return wordnet.NOUN
    elif treebank_tag.startswith('R'):
        return wordnet.ADV
    else:
        return ''

然后可以将返回值与lemmatizer一起使用:

You can then use the return value with the lemmatizer:

from nltk.stem.wordnet import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
lemmatizer.lemmatize('going', wordnet.VERB)
>>> 'go'

在将返回值传递给Lemmatizer之前检查返回值，因为空字符串会产生KeyError.

Check the return value before passing it to the Lemmatizer because an empty string would give a KeyError.

这篇关于python中的wordnet词法化和pos标记的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

python中的wordnet词法化和pos标记 [英] wordnet lemmatization and pos tagging in python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

python中的wordnet词法化和pos标记 [英] wordnet lemmatization and pos tagging in python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭