Python NLTK:如何用简化的词性标签集标记句子? [英] Python NLTK: How to tag sentences with the simplified set of part-of-speech tags?

查看：160 发布时间：2020/5/18 1:12:41 python tagging nltk

本文介绍了Python NLTK:如何用简化的词性标签集标记句子?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

Python NLTK书的第5章给出了此示例标记句子中的单词的方法:

Chapter 5 of the Python NLTK book gives this example of tagging words in a sentence:

>>> text = nltk.word_tokenize("And now for something completely different")
>>> nltk.pos_tag(text)
[('And', 'CC'), ('now', 'RB'), ('for', 'IN'), ('something', 'NN'), ('completely', 'RB'), ('different', 'JJ')]

nltk.pos_tag调用默认标记器，该标记器使用全套标记.在本章后面的简化的标签集.

nltk.pos_tag calls the default tagger, which uses a full set of tags. Later in the chapter a simplified set of tags is introduced.

如何用这种简化的词性标签集标记句子?

How can I tag sentences with this simplified set of part-of-speech tags?

我是否也正确理解了标记器，即我可以按要求更改标记器使用的标记集，还是应该将其返回的标记映射到简化集，还是应该从中创建新的标记器?一个新的，简单标记的语料库?

Also have I understood the tagger correctly, i.e. can I change the tag set that the tagger uses as I'm asking, or should I map the tags it returns on to the simplified set, or should I create a new tagger from a new, simply-tagged corpus?

推荐答案

要简化默认标记器中的标记，可以使用nltk.tag.simplify.simplify_wsj_tag，如下所示:

To simplify tags from the default tagger, you can use nltk.tag.simplify.simplify_wsj_tag, like so:

>>> import nltk
>>> from nltk.tag.simplify import simplify_wsj_tag
>>> tagged_sent = nltk.pos_tag(tokens)
>>> simplified = [(word, simplify_wsj_tag(tag)) for word, tag in tagged_sent]

这篇关于Python NLTK:如何用简化的词性标签集标记句子?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Python NLTK:如何用简化的词性标签集标记句子? [英] Python NLTK: How to tag sentences with the simplified set of part-of-speech tags?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python NLTK:如何用简化的词性标签集标记句子? [英] Python NLTK: How to tag sentences with the simplified set of part-of-speech tags?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭