NLTK中没有pos_tag的ne_chunk [英] ne_chunk without pos_tag in NLTK

查看:320
本文介绍了NLTK中没有pos_tag的ne_chunk的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在nltk中使用ne_chunk和pos_tag对句子进行分块.

I'm trying to chunk a sentence using ne_chunk and pos_tag in nltk.

from nltk import tag
from nltk.tag import pos_tag
from nltk.tree import Tree
from nltk.chunk import ne_chunk

sentence = "Michael and John is reading a booklet in a library of Jakarta"
tagged_sent = pos_tag(sentence.split())

print_chunk = [chunk for chunk in ne_chunk(tagged_sent) if isinstance(chunk, Tree)]

print print_chunk

这是结果:

[Tree('GPE', [('Michael', 'NNP')]), Tree('PERSON', [('John', 'NNP')]), Tree('GPE', [('Jakarta', 'NNP')])]

我的问题是,是否可以不包括pos_tag(如上面的NNP),而仅包括Tree'GPE','PERSON'? 以及"GPE"是什么意思?

my question, is it possible not to include pos_tag (like NNP above) and only include Tree 'GPE','PERSON'? and what 'GPE' means?

预先感谢

推荐答案

命名实体分块器将为您提供一棵同时包含块和标签的树.您不能更改它,但是可以取出标签.从您的tagged_sent:

The named entity chunker will give you a tree containing both chunks and tags. You can't change that, but you can take the tags out. Starting from your tagged_sent:

chunks = nltk.ne_chunk(tagged_sent)
simple = []
for elt in chunks:
    if isinstance(elt, Tree):
        simple.append(Tree(elt.label(), [ word for word, tag in elt ]))
    else:
        simple.append( elt[0] )

如果只需要块,则在上面省略else:子句.您可以修改代码以任意方式包装大块.我使用了nltk Tree来将更改保持在最低限度.请注意,某些块包含多个单词(尝试在示例中添加"New York"),因此,块的内容必须是列表,而不是单个元素.

If you only want the chunks, omit the else: clause in the above. You can adapt the code to wrap the chunks any way you want. I used an nltk Tree to keep the changes to a minimum. Note that some chunks consist of multiple words (try adding "New York" to your example), so the chunk's contents must be a list, not a single element.

PS. "GPE"代表地缘政治实体"(显然是一个大块的错误).您可以在nltk的书中找到常用标签"的列表,这里.

PS. "GPE" stands for "geo-political entity" (obviously a chunker mistake). You can see a list of the "commonly used tags" in the nltk book, here.

这篇关于NLTK中没有pos_tag的ne_chunk的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆