解析文本以获取专有名词(名称和组织)-Python nltk [英] Parse text to get the proper nouns (names and organizations) - python nltk

查看:127
本文介绍了解析文本以获取专有名词(名称和组织)-Python nltk的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从很小的文本(如sms)中提取专有名词,如名称和组织名称,这些文本是nltk提供的基本解析器

I am trying to extract proper nouns as in Names and Organization names from very small chunks of texts like sms, the basic parsers available with nltk Finding Proper Nouns using NLTK WordNet are being able to get the nouns but the problem is when we get proper nouns not starting with a capital letter , for texts like this the names like sumit do not get recognized as proper nouns

>>> sentence = "i spoke with sumit and rajesh and Samit about the gridlock situation last night @ around 8 pm last nite"
>>> tagged_sent = pos_tag(sentence.split())
>>> print tagged_sent
[('i', 'PRP'), ('spoke', 'VBP'), ('with', 'IN'), **('sumit', 'NN')**, ('and', 'CC'), ('rajesh', 'JJ'), ('and', 'CC'), **('Samit', 'NNP'),** ('about', 'IN'), ('the', 'DT'), ('gridlock', 'NN'), ('situation', 'NN'), ('last', 'JJ'), ('night', 'NN'), ('@', 'IN'), ('around', 'IN'), ('8', 'CD'), ('pm', 'NN'), ('last', 'JJ'), ('nite', 'NN')]

推荐答案

您可能想看看大写.抱歉,答案不完整,但是我在使用python-nameparser方面经验不足.

You might want to have a look at python-nameparser. It tries to guess capitalization of names also. Sorry for the incomplete answer but I don't have much experience using python-nameparser.

祝你好运!

这篇关于解析文本以获取专有名词(名称和组织)-Python nltk的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆