加载自定义 NER 模型 Stanford CoreNLP [英] Load Custom NER Model Stanford CoreNLP
问题描述
我使用斯坦福大学的Stanford-NER"软件创建了自己的 NER 模型,并遵循 这些 方向.
I have created my own NER model with Stanford's "Stanford-NER" software and by following these directions.
我知道 CoreNLP 按以下顺序开箱即用地加载了三个 NER 模型:
I am aware that CoreNLP loads three NER models out of the box in the following order:
edu/stanford/nlp/models/ner/english.all.3class.distim.crf.ser.gz
edu/stanford/nlp/models/ner/english.muc.7class.distim.crf.ser.gz
edu/stanford/nlp/models/ner/english.conll.4class.distim.crf.ser.gz
我现在想将我的 NER 模型包括在上面的列表中,并首先用我的 NER 模型标记文本.
I now want to include my NER model in the list above and have the text tagged by my NER model first.
我发现了两个之前关于这个主题的 StackOverflow 问题,它们是 'Stanford OpenIE using自定义 NER 模型' 和 '为什么斯坦福 CoreNLP NER-annotator 默认加载 3 个模型?'
I have found two previous StackOverflow questions regarding this topic and they are 'Stanford OpenIE using customized NER model' and 'Why does Stanford CoreNLP NER-annotator load 3 models by default?'
这两篇文章都有很好的答案.答案的一般信息是您必须在文件中编辑代码.
Both of these posts have good answers. The general message of the answers is that you have to edit code within a file.
Stanford OpenIE 使用定制的 NER 模型
从这篇文章中可以看出编辑 corenlpserver.sh
但我在斯坦福 CoreNLP 下载的软件中找不到这个文件.谁能指出我这个文件的位置?
From this post it says to edit corenlpserver.sh
but I cannot find this file within the Stanford CoreNLP downloaded software. Can anyone point me to this file's location?
Stanford CoreNLP NER-annotator 是否默认加载 3 个模型?
这篇文章说我可以使用 -ner.model
的参数来具体调用要加载的 NER 模型.我将此参数添加到初始服务器命令(java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000 -ner.model *modlefilepathhere*
).这不起作用,因为服务器仍然加载了所有三个模型.
This post says that I can use the argument of -ner.model
to specifically call which NER models to load. I added this argument to the initial server command (java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000 -ner.model *modlefilepathhere*
). This did not work as the server still loaded all three models.
它还指出您必须更改一些 Java 代码,尽管它没有明确指出在何处进行更改.
It also states that you have to change some java code though it does not specifically call out where to make the change.
我是否需要修改或添加这段代码 props.put("ner.model", "model_path1,model_path2");
到 CoreNLP 软件中的特定类文件?
Do I need to modify or add this code props.put("ner.model", "model_path1,model_path2");
to a specific class file in the CoreNLP software?
问题:根据我的研究,我似乎需要添加/修改一些代码来调用我独特的 NER 模型.上面概述了这些编辑",这些信息是从其他 StackOverflow 问题中提取的.我具体需要编辑哪些文件?这些文件具体位于何处(即 edu/Stanford/nlp/...等)?
QUESTION: From my research it seems that I need to add/modify some code to call my unique NER model. These 'edits' are outlined above and this information has been pulled from other StackOverflow questions. What files specifically do I need to edit? Where exactly are these files located (i.e. edu/Stanford/nlp/...etc)?
我的系统在本地服务器上运行,我使用 API pycorenlp 来打开到我的本地服务器的管道并对其发出请求.python/pycorenlp 代码的两个关键行是:
My system is running on a local server and I'm using the API pycorenlp in order to open a pipeline to my local server and to make requests against it. the two critical lines of python/pycorenlp code are:
nlp = StanfordCoreNLP('http://localhost:9000')
output = nlp.annotate(evalList[line], properties={'annotators': 'ner, openie','outputFormat': 'json', 'openie.triple.strict':'True','openie.max_entailments_per_clause':'1'})
我不认为这会影响我调用我独特的 NER 模型的能力,但我想展示我所能提供的所有情景数据,以获得最佳答案.
I do NOT think this will affect my ability to call my unique NER model but I wanted to present all the situational data I can in order to obtain the best possible answer.
推荐答案
如果您想自定义服务器使用的管道,请创建一个名为 server.properties
的文件(或者您可以随意调用它想要).
If you want to customize the pipeline the server uses, create a file called server.properties
(or you can call it whatever you want).
然后在使用java命令启动服务器-serverProperties server.properties
时添加此选项.
Then add this option when you start the server -serverProperties server.properties
with the java command.
在那个 .properties 文件中你应该包含 ner.model =/path/to/custom_model.ser.gz
In that .properties file you should include ner.model = /path/to/custom_model.ser.gz
通常,您可以自定义服务器将在该 .properties 文件中使用的管道.例如,您还可以使用 annotators = tokenize,ssplit,pos,lemma,ner,parse
等行设置其中的注释器列表...
In general you can customize the pipeline the server will use in that .properties file. For instance you can also set the list of annotators in it with the line annotators = tokenize,ssplit,pos,lemma,ner,parse
etc...
更新以解决评论:
在您的 java 命令中,您不需要
-ner.model/path/to/custom_model.ser.gz
一个 .properties 文件可以有无限数量的属性设置,每行一个设置(空白行被忽略,#'d out 行也被忽略)
A .properties file can have an unlimited amount of properties settings in it, one setting per line (blank lines are ignored, as are #'d out lines)
当您运行 Java 命令时,它默认会在您运行该命令的目录中查找文件.因此,如果您的命令包含 -serverProperties server.properties
,它将假定文件 server.properties
位于运行命令的同一目录中.如果您提供绝对路径而不是 -serverProperties/path/to/server.properties
,您可以从任何地方运行该命令.
When you run a Java command, it default looks for files in the directory you are running the command. So if your command includes -serverProperties server.properties
it is going to assume that the file server.properties
is in the same directory the command is running from. If you supply an absolute path instead -serverProperties /path/to/server.properties
you can run the command from anywhere.
为了清楚起见,您可以使用以下命令启动服务器(在包含所有 jar 的文件夹中运行):
So just to be clear you could start the server with this command (run in the folder with all the jars):
java -Xmx8g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000 -serverProperties server.properties
和 server.properties
应该是这样的文件:
and server.properties
should be a file like this:
ner.model = /path/to/custom_model.ser.gz
server.properties
可能如下所示:
annotators = tokenize,ssplit,pos,lemma,ner,depparse
ner.model = /path/to/custom_model.ser.gz
parse.maxlen = 100
仅作为示例...您应该将所有设置放入 server.properties
just as an example...you should put all settings into server.properties
- 我在之前的回答中对从 Python 访问 StanfordCoreNLP 服务器做了一些评论:
无法通过终端使用pycorenlp for python3.5
您似乎在使用我并不真正了解的 pycorenlp 库.其他 2 个选项是我在该答案中显示的一些代码或我们制作的 stanza
包.上面那个答案中的详细信息.
You appear to be using the pycorenlp library which I don't really know about. 2 other options are some code I show in that answer or the stanza
package we make. Details in that answer above.
这篇关于加载自定义 NER 模型 Stanford CoreNLP的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!