使用NLTK WordNet查找专有名词 [英] Finding Proper Nouns using NLTK WordNet

查看:129
本文介绍了使用NLTK WordNet查找专有名词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否可以使用NLTK WordNet查找专有名词?即,我可以使用nltk Wordnet标记所有名词吗?

Is there any way to find proper nouns using NLTK WordNet?Ie., Can i tag Possessive nouns using nltk Wordnet ?

推荐答案

我认为您不需要WordNet来查找专有名词,我建议使用词性标记器pos_tag.

I don't think you need WordNet to find proper nouns, I suggest using the Part-Of-Speech tagger pos_tag.

要查找专有名词,请查找NNP标记:

To find Proper Nouns, look for the NNP tag:

from nltk.tag import pos_tag

sentence = "Michael Jackson likes to eat at McDonalds"
tagged_sent = pos_tag(sentence.split())
# [('Michael', 'NNP'), ('Jackson', 'NNP'), ('likes', 'VBZ'), ('to', 'TO'), ('eat', 'VB'), ('at', 'IN'), ('McDonalds', 'NNP')]

propernouns = [word for word,pos in tagged_sent if pos == 'NNP']
# ['Michael','Jackson', 'McDonalds']

由于MichaelJackson被分为2个标记,您可能会不太满意,那么您可能需要诸如名称实体标记器之类的更复杂的东西.

You may not be very satisfied since Michael and Jackson is split into 2 tokens, then you might need something more complex such as Name Entity tagger.

penntreebank标签集所述,对于所有格名词,您只需查找POS标签

By right, as documented by the penntreebank tagset, for possessive nouns, you can simply look for the POS tag, http://www.mozart-oz.org/mogul/doc/lager/brill-tagger/penn.html. But often the tagger doesn't tag POS when it's an NNP.

要查找所有名词,请查找str.endswith('s")或str.endswith("s'"):

from nltk.tag import pos_tag

sentence = "Michael Jackson took Daniel Jackson's hamburger and Agnes' fries"
tagged_sent = pos_tag(sentence.split())
# [('Michael', 'NNP'), ('Jackson', 'NNP'), ('took', 'VBD'), ('Daniel', 'NNP'), ("Jackson's", 'NNP'), ('hamburger', 'NN'), ('and', 'CC'), ("Agnes'", 'NNP'), ('fries', 'NNS')]

possessives = [word for word in sentence if word.endswith("'s") or word.endswith("s'")]
# ["Jackson's", "Agnes'"]

或者,您可以使用NLTK ne_chunk,但是除非您担心从句子中获得什么样的专有名词,否则它似乎没有其他作用:

Alternatively, you can use NLTK ne_chunk but it doesn't seem to do much other unless you are concerned about what kind of Proper Noun you get from the sentence:

>>> from nltk.tree import Tree; from nltk.chunk import ne_chunk
>>> [chunk for chunk in ne_chunk(tagged_sent) if isinstance(chunk, Tree)]
[Tree('PERSON', [('Michael', 'NNP')]), Tree('PERSON', [('Jackson', 'NNP')]), Tree('PERSON', [('Daniel', 'NNP')])]
>>> [i[0] for i in list(chain(*[chunk.leaves() for chunk in ne_chunk(tagged_sent) if isinstance(chunk, Tree)]))]
['Michael', 'Jackson', 'Daniel']

使用ne_chunk有点冗长,并且不能使您拥有所有格.

Using ne_chunk is a little verbose and it doesn't get you the possessives.

这篇关于使用NLTK WordNet查找专有名词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆