解析文本以获取专有名词(名称和组织)-Python nltk [英] Parse text to get the proper nouns (names and organizations) - python nltk
问题描述
我正在尝试从很小的文本(如sms)中提取专有名词,如名称和组织名称,这些文本是nltk提供的基本解析器
I am trying to extract proper nouns as in Names and Organization names from very small chunks of texts like sms, the basic parsers available with nltk Finding Proper Nouns using NLTK WordNet are being able to get the nouns but the problem is when we get proper nouns not starting with a capital letter , for texts like this the names like sumit do not get recognized as proper nouns
>>> sentence = "i spoke with sumit and rajesh and Samit about the gridlock situation last night @ around 8 pm last nite"
>>> tagged_sent = pos_tag(sentence.split())
>>> print tagged_sent
[('i', 'PRP'), ('spoke', 'VBP'), ('with', 'IN'), **('sumit', 'NN')**, ('and', 'CC'), ('rajesh', 'JJ'), ('and', 'CC'), **('Samit', 'NNP'),** ('about', 'IN'), ('the', 'DT'), ('gridlock', 'NN'), ('situation', 'NN'), ('last', 'JJ'), ('night', 'NN'), ('@', 'IN'), ('around', 'IN'), ('8', 'CD'), ('pm', 'NN'), ('last', 'JJ'), ('nite', 'NN')]
推荐答案
您可能想看看大写.抱歉,答案不完整,但是我在使用python-nameparser方面经验不足.
You might want to have a look at python-nameparser. It tries to guess capitalization of names also. Sorry for the incomplete answer but I don't have much experience using python-nameparser.
祝你好运!
这篇关于解析文本以获取专有名词(名称和组织)-Python nltk的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!