使用NLTK WordNet查找专有名词 [英] Finding Proper Nouns using NLTK WordNet
问题描述
是否可以使用NLTK WordNet查找专有名词?即,我可以使用nltk Wordnet标记所有名词吗?
Is there any way to find proper nouns using NLTK WordNet?Ie., Can i tag Possessive nouns using nltk Wordnet ?
推荐答案
我认为您不需要WordNet来查找专有名词,我建议使用词性标记器pos_tag
.
I don't think you need WordNet to find proper nouns, I suggest using the Part-Of-Speech tagger pos_tag
.
要查找专有名词,请查找NNP
标记:
To find Proper Nouns, look for the NNP
tag:
from nltk.tag import pos_tag
sentence = "Michael Jackson likes to eat at McDonalds"
tagged_sent = pos_tag(sentence.split())
# [('Michael', 'NNP'), ('Jackson', 'NNP'), ('likes', 'VBZ'), ('to', 'TO'), ('eat', 'VB'), ('at', 'IN'), ('McDonalds', 'NNP')]
propernouns = [word for word,pos in tagged_sent if pos == 'NNP']
# ['Michael','Jackson', 'McDonalds']
由于Michael
和Jackson
被分为2个标记,您可能会不太满意,那么您可能需要诸如名称实体标记器之类的更复杂的东西.
You may not be very satisfied since Michael
and Jackson
is split into 2 tokens, then you might need something more complex such as Name Entity tagger.
如penntreebank
标签集所述,对于所有格名词,您只需查找POS
标签
By right, as documented by the penntreebank
tagset, for possessive nouns, you can simply look for the POS
tag, http://www.mozart-oz.org/mogul/doc/lager/brill-tagger/penn.html. But often the tagger doesn't tag POS
when it's an NNP
.
要查找所有名词,请查找str.endswith('s")或str.endswith("s'"):
from nltk.tag import pos_tag
sentence = "Michael Jackson took Daniel Jackson's hamburger and Agnes' fries"
tagged_sent = pos_tag(sentence.split())
# [('Michael', 'NNP'), ('Jackson', 'NNP'), ('took', 'VBD'), ('Daniel', 'NNP'), ("Jackson's", 'NNP'), ('hamburger', 'NN'), ('and', 'CC'), ("Agnes'", 'NNP'), ('fries', 'NNS')]
possessives = [word for word in sentence if word.endswith("'s") or word.endswith("s'")]
# ["Jackson's", "Agnes'"]
或者,您可以使用NLTK ne_chunk
,但是除非您担心从句子中获得什么样的专有名词,否则它似乎没有其他作用:
Alternatively, you can use NLTK ne_chunk
but it doesn't seem to do much other unless you are concerned about what kind of Proper Noun you get from the sentence:
>>> from nltk.tree import Tree; from nltk.chunk import ne_chunk
>>> [chunk for chunk in ne_chunk(tagged_sent) if isinstance(chunk, Tree)]
[Tree('PERSON', [('Michael', 'NNP')]), Tree('PERSON', [('Jackson', 'NNP')]), Tree('PERSON', [('Daniel', 'NNP')])]
>>> [i[0] for i in list(chain(*[chunk.leaves() for chunk in ne_chunk(tagged_sent) if isinstance(chunk, Tree)]))]
['Michael', 'Jackson', 'Daniel']
使用ne_chunk
有点冗长,并且不能使您拥有所有格.
Using ne_chunk
is a little verbose and it doesn't get you the possessives.
这篇关于使用NLTK WordNet查找专有名词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!