使用stanfordnlp库中的REGEXNER注释作者姓名 [英] Annotate author names using REGEXNER from the stanfordnlp library

查看：147 发布时间：2020/8/6 3:09:54 python regex stanford-nlp ner

本文介绍了使用stanfordnlp库中的REGEXNER注释作者姓名的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的目标是用PERSON实体注释科学文章中的作者姓名. 我对与这种格式匹配的名称(作者名等日期)特别感兴趣. 例如，我希望对这句话(Minot et al.2000)=>注释Minot作为PERSON. 我使用的是斯坦福大学nlp团队官方页面上的代码的改编版:

My goal is to annotate author names from scientific articles with the entity PERSON. I am particularly interested with the names that match this format (authorname et al. date). For example I would like for this sentence (Minot et al. 2000 ) => to annotate Minot as a PERSON. I am using an adapted version of the code found in the official page of stanford nlp team:

import stanfordnlp

from stanfordnlp.server import CoreNLPClient
# example text
print('---')
print('input text')
print('')

text = "In practice, its scope is broad and includes the analysis of a diverse set of samples such as gut microbiome (Qin et al., 2010), (Minot et al., 2011), environmental (Mizuno et al., 2013) or clinical (Willner et al., 2009), (Negredo et al., 2011), (McMullan et al., 2012) samples."

# set up the client
print('---')
print('starting up Java Stanford CoreNLP Server...')
#Properties dictionary
prop={'regexner.mapping': 'rgxrules.txt', 'annotators': 'tokenize,ssplit,pos,lemma,ner,regexner'}
# set up the client


with CoreNLPClient(properties=prop,timeout=100000, memory='16G',be_quiet=False ) as client:
    # submit the request to the server
    ann = client.annotate(text)
    # get the first sentence
    sentence = ann.sentence[0]

运行代码后，我得到以下错误肯定和错误否定: 内格雷多不是用PERSON注释，而是O，而Minot是CITY，因为它是美国城市之一，但在此特殊句子中，应加上作者的名字.

After running the code I get the following false positives and false negative: Negredo is not annotated with PERSON but rather O, and Minot as CITY because it's one of the american cities but in this particular sentence it should be annotated with the name of an author.

我试图解决此问题的方法是将此行添加到我传递给corenlpclient的rgxrules.txt文件中.这是我在此文件中包含的行:

My attempt to solve this problem was to add this line to the rgxrules.txt file that I pass to the corenlpclient. Here is the line that I have in this file:

[[A-Z][a-z]] /et/ /al\./\tPERSON

这不能解决您可以检查是否运行代码的问题.我也不知道该如何添加这样一个事实，即我只想要与"[[A-Z] [a-z]]"匹配的单词，并且该单词早于et al.要用PERSON注释，而不是整个句子"Minot et al."例如.

This does not solve the problem you can check if you run the code. Also I don't know how to add the fact that I only want the word that matches '[[A-Z][a-z]]' and that comes before et al. to be annotated with PERSON not the whole sentence 'Minot et al.' for example.

任何想法我都可以解决这个问题.

Any idea how I can solve this problem.

谢谢.

使用stanfordnlp库中的REGEXNER注释作者姓名 [英] Annotate author names using REGEXNER from the stanfordnlp library

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用stanfordnlp库中的REGEXNER注释作者姓名 [英] Annotate author names using REGEXNER from the stanfordnlp library

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭