使用Python提取NLP中命名实体识别中的人名 [英] Extracting the person names in the named entity recognition in NLP using Python

查看:218
本文介绍了使用Python提取NLP中命名实体识别中的人名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个句子,我需要单独确定人名:

I have a sentence for which i need to identify the Person names alone:

例如:

sentence = "Larry Page is an American business magnate and computer scientist who is the co-founder of Google, alongside Sergey Brin"

我使用以下代码来识别 NER.

I have used the below code to identify the NERs.

from nltk import word_tokenize, pos_tag, ne_chunk
print(ne_chunk(pos_tag(word_tokenize(sentence))))

我收到的输出是:

(S
  (PERSON Larry/NNP)
  (ORGANIZATION Page/NNP)
  is/VBZ
  an/DT
  (GPE American/JJ)
  business/NN
  magnate/NN
  and/CC
  computer/NN
  scientist/NN
  who/WP
  is/VBZ
  the/DT
  co-founder/NN
  of/IN
  (GPE Google/NNP)
  ,/,
  alongside/RB
  (PERSON Sergey/NNP Brin/NNP))

我想提取所有的人名,比如

I want to extract all the person names, such as

Larry Page
Sergey Brin

为了实现这一点,我推荐了这个 link 并尝试了这个.

In order to achieve this, I refereed this link and tried this.

from nltk.tag.stanford import StanfordNERTagger
st = StanfordNERTagger('/usr/share/stanford-ner/classifiers/english.all.3class.distsim.crf.ser.gz','/usr/share/stanford-ner/stanford-ner.jar')

但是我继续收到此错误:

However i continue to get this error:

LookupError: Could not find stanford-ner.jar jar file at /usr/share/stanford-ner/stanford-ner.jar

我在哪里可以下载这个文件?

Where can i download this file?

如上所述,我期望的列表或字典形式的结果是:

As informed above, the result that i am expecting in the form of list or dictionary is :

Larry Page
Sergey Brin

推荐答案

In Long

仔细阅读:

了解解决方案,不要只是复制和粘贴.

Understand the solution, don't just copy and paste.

在终端:

pip install -U nltk

wget http://nlp.stanford.edu/software/stanford-corenlp-full-2016-10-31.zip
unzip stanford-corenlp-full-2016-10-31.zip && cd stanford-corenlp-full-2016-10-31

java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer \
-preload tokenize,ssplit,pos,lemma,parse,depparse \
-status_port 9000 -port 9000 -timeout 15000

在 Python 中

from nltk.tag.stanford import CoreNLPNERTagger

def get_continuous_chunks(tagged_sent):
    continuous_chunk = []
    current_chunk = []

    for token, tag in tagged_sent:
        if tag != "O":
            current_chunk.append((token, tag))
        else:
            if current_chunk: # if the current chunk is not empty
                continuous_chunk.append(current_chunk)
                current_chunk = []
    # Flush the final current_chunk into the continuous_chunk, if any.
    if current_chunk:
        continuous_chunk.append(current_chunk)
    return continuous_chunk


stner = CoreNLPNERTagger()
tagged_sent = stner.tag('Rami Eid is studying at Stony Brook University in NY'.split())

named_entities = get_continuous_chunks(tagged_sent)
named_entities_str_tag = [(" ".join([token for token, tag in ne]), ne[0][1]) for ne in named_entities]


print(named_entities_str_tag)

[输出]:

[('Rami Eid', 'PERSON'), ('Stony Brook University', 'ORGANIZATION'), ('NY', 'LOCATION')]

您也可能会找到此帮助:解包列表/元组对成两个列表/元组

You might find this help too: Unpacking a list / tuple of pairs into two lists / tuples

这篇关于使用Python提取NLP中命名实体识别中的人名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆