加载自定义 NER 模型 Stanford CoreNLP [英] Load Custom NER Model Stanford CoreNLP

查看:38
本文介绍了加载自定义 NER 模型 Stanford CoreNLP的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用斯坦福大学的Stanford-NER"软件创建了自己的 NER 模型,并遵循 这些 方向.

I have created my own NER model with Stanford's "Stanford-NER" software and by following these directions.

我知道 CoreNLP 按以下顺序开箱即用地加载了三个 NER 模型:

I am aware that CoreNLP loads three NER models out of the box in the following order:

  1. edu/stanford/nlp/models/ner/english.all.3class.distim.crf.ser.gz
  2. edu/stanford/nlp/models/ner/english.muc.7class.distim.crf.ser.gz
  3. edu/stanford/nlp/models/ner/english.conll.4class.distim.crf.ser.gz

我现在想将我的 NER 模型包括在上面的列表中,并首先用我的 NER 模型标记文本.

I now want to include my NER model in the list above and have the text tagged by my NER model first.

我发现了两个之前关于这个主题的 StackOverflow 问题,它们是 'Stanford OpenIE using自定义 NER 模型''为什么斯坦福 CoreNLP NER-annotator 默认加载 3 个模型?'

I have found two previous StackOverflow questions regarding this topic and they are 'Stanford OpenIE using customized NER model' and 'Why does Stanford CoreNLP NER-annotator load 3 models by default?'

这两篇文章都有很好的答案.答案的一般信息是您必须在文件中编辑代码.

Both of these posts have good answers. The general message of the answers is that you have to edit code within a file.

Stanford OpenIE 使用定制的 NER 模型

从这篇文章中可以看出编辑 corenlpserver.sh 但我在斯坦福 CoreNLP 下载的软件中找不到这个文件.谁能指出我这个文件的位置?

From this post it says to edit corenlpserver.sh but I cannot find this file within the Stanford CoreNLP downloaded software. Can anyone point me to this file's location?

Stanford CoreNLP NER-annotator 是否默认加载 3 个模型?

这篇文章说我可以使用 -ner.model 的参数来具体调用要加载的 NER 模型.我将此参数添加到初始服务器命令(java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000 -ner.model *modlefilepathhere*).这不起作用,因为服务器仍然加载了所有三个模型.

This post says that I can use the argument of -ner.model to specifically call which NER models to load. I added this argument to the initial server command (java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000 -ner.model *modlefilepathhere*). This did not work as the server still loaded all three models.

它还指出您必须更改一些 Java 代码,尽管它没有明确指出在何处进行更改.

It also states that you have to change some java code though it does not specifically call out where to make the change.

我是否需要修改或添加这段代码 props.put("ner.model", "model_path1,model_path2"); 到 CoreNLP 软件中的特定类文件?

Do I need to modify or add this code props.put("ner.model", "model_path1,model_path2"); to a specific class file in the CoreNLP software?

问题:根据我的研究,我似乎需要添加/修改一些代码来调用我独特的 NER 模型.上面概述了这些编辑",这些信息是从其他 StackOverflow 问题中提取的.我具体需要编辑哪些文件?这些文件具体位于何处(即 edu/Stanford/nlp/...等)?

QUESTION: From my research it seems that I need to add/modify some code to call my unique NER model. These 'edits' are outlined above and this information has been pulled from other StackOverflow questions. What files specifically do I need to edit? Where exactly are these files located (i.e. edu/Stanford/nlp/...etc)?

我的系统在本地服务器上运行,我使用 API pycorenlp 来打开到我的本地服务器的管道并对其发出请求.python/pycorenlp 代码的两个关键行是:

My system is running on a local server and I'm using the API pycorenlp in order to open a pipeline to my local server and to make requests against it. the two critical lines of python/pycorenlp code are:

  1. nlp = StanfordCoreNLP('http://localhost:9000')
  2. output = nlp.annotate(evalList[line], properties={'annotators': 'ner, openie','outputFormat': 'json', 'openie.triple.strict':'True','openie.max_entailments_per_clause':'1'})

认为这会影响我调用我独特的 NER 模型的能力,但我想展示我所能提供的所有情景数据,以获得最佳答案.

I do NOT think this will affect my ability to call my unique NER model but I wanted to present all the situational data I can in order to obtain the best possible answer.

推荐答案

如果您想自定义服务器使用的管道,请创建一个名为 server.properties 的文件(或者您可以随意调用它想要).

If you want to customize the pipeline the server uses, create a file called server.properties (or you can call it whatever you want).

然后在使用java命令启动服务器-serverProperties server.properties时添加此选项.

Then add this option when you start the server -serverProperties server.properties with the java command.

在那个 .properties 文件中你应该包含 ner.model =/path/to/custom_model.ser.gz

In that .properties file you should include ner.model = /path/to/custom_model.ser.gz

通常,您可以自定义服务器将在该 .properties 文件中使用的管道.例如,您还可以使用 annotators = tokenize,ssplit,pos,lemma,ner,parse 等行设置其中的注释器列表...

In general you can customize the pipeline the server will use in that .properties file. For instance you can also set the list of annotators in it with the line annotators = tokenize,ssplit,pos,lemma,ner,parse etc...

更新以解决评论:

  1. 在您的 java 命令中,您不需要 -ner.model/path/to/custom_model.ser.gz

一个 .properties 文件可以有无限数量的属性设置,每行一个设置(空白行被忽略,#'d out 行也被忽略)

A .properties file can have an unlimited amount of properties settings in it, one setting per line (blank lines are ignored, as are #'d out lines)

当您运行 Java 命令时,它默认会在您运行该命令的目录中查找文件.因此,如果您的命令包含 -serverProperties server.properties,它将假定文件 server.properties 位于运行命令的同一目录中.如果您提供绝对路径而不是 -serverProperties/path/to/server.properties,您可以从任何地方运行该命令.

When you run a Java command, it default looks for files in the directory you are running the command. So if your command includes -serverProperties server.properties it is going to assume that the file server.properties is in the same directory the command is running from. If you supply an absolute path instead -serverProperties /path/to/server.properties you can run the command from anywhere.

为了清楚起见,您可以使用以下命令启动服务器(在包含所有 jar 的文件夹中运行):

So just to be clear you could start the server with this command (run in the folder with all the jars):

java -Xmx8g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000 -serverProperties server.properties

server.properties 应该是这样的文件:

and server.properties should be a file like this:

ner.model = /path/to/custom_model.ser.gz

server.properties 可能如下所示:

annotators = tokenize,ssplit,pos,lemma,ner,depparse
ner.model = /path/to/custom_model.ser.gz
parse.maxlen = 100

仅作为示例...您应该将所有设置放入 server.properties

just as an example...you should put all settings into server.properties

  1. 我在之前的回答中对从 Python 访问 StanfordCoreNLP 服务器做了一些评论:

无法通过终端使用pycorenlp for python3.5

您似乎在使用我并不真正了解的 pycorenlp 库.其他 2 个选项是我在该答案中显示的一些代码或我们制作的 stanza 包.上面那个答案中的详细信息.

You appear to be using the pycorenlp library which I don't really know about. 2 other options are some code I show in that answer or the stanza package we make. Details in that answer above.

这篇关于加载自定义 NER 模型 Stanford CoreNLP的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆