使用法语模型运行Stanford corenlp服务器 [英] Running Stanford corenlp server with French models
问题描述
我正在尝试使用Stanford CoreNLP工具分析一些法语文本(这是我第一次尝试使用任何StanfordNLP软件)
I am trying to analyse some French text with the Stanford CoreNLP tool (it's my first time trying to use any StanfordNLP software)
To do so, I have downloaded the v3.6.0 jar and the corresponding french models.
然后我使用以下命令运行服务器:
Then I run the server with:
java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer
如该 answer 中所述,我通过以下方式调用API:
As described in this answer, I call the API with:
wget --post-data 'Bonjour le monde.' 'localhost:9000/?properties={"parse.model":"edu/stanford/nlp/models/parser/nndep/UD_French.gz", "annotators": "parse", "outputFormat": "json"}' -O -
但出现以下日志+错误:
but I get the following log + error:
[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP
Adding annotator tokenize
[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.TokenizerAnnotator - TokenizerAnnotator: No tokenizer type provided. Defaulting to PTBTokenizer.
[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator parse
[pool-1-thread-1] INFO edu.stanford.nlp.parser.common.ParserGrammar - Loading parser from serialized file edu/stanford/nlp/models/parser/nndep/UD_French.gz ...
edu.stanford.nlp.io.RuntimeIOException: java.io.StreamCorruptedException: invalid stream header: 64696374
at edu.stanford.nlp.parser.common.ParserGrammar.loadModel(ParserGrammar.java:188)
at edu.stanford.nlp.pipeline.ParserAnnotator.loadModel(ParserAnnotator.java:212)
at edu.stanford.nlp.pipeline.ParserAnnotator.<init>(ParserAnnotator.java:115)
...
建议的解决方案此处建议代码和模型版本不同但我从同一页面下载了它们(它们的名称都具有相同的版本号),因此我可以肯定它们是相同的.
The solutions proposed here suggest the code and model version differs but I have dowloaded them from the same page (and they both have the same version number in their name) so I am pretty sure they are the same.
关于我在做什么错的其他提示吗?
Any other hint on what I am doing wrong?
(我还要提到我不是Java专家,所以也许我忘记了一个愚蠢的步骤...)
推荐答案
好吧,经过大量阅读和尝试后,我找到了一种使之工作的方法(对于 v3.6.0 ).如果其他人可能感兴趣,请查看以下详细信息:
Ok, after a lot of readings and unsuccessful tries, I found a way to make it work (for v3.6.0). Here are the details, if they can be of any interest to someone else:
-
从 http://stanfordnlp.github下载代码和法语模型. io/CoreNLP/index.html#download .解压代码
.zip
并将法语模型.jar
复制到该目录(请不要删除英语模型,反正它们具有不同的名称)
Dowload the code and french models from http://stanfordnlp.github.io/CoreNLP/index.html#download. Unzip the code
.zip
and copy the french model.jar
to that directory (do not remove the english models, they have different names anyway)
cd到该目录,然后使用以下命令运行服务器:
cd to that directory and then run the server with:
java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer
(很遗憾-prop
标志在这里没有帮助)
(it's a pity that the -prop
flag doesn't help here)
-
重复
StanfordCoreNLP-french.properties
中列出的属性,调用API:
Call the API repeating the properties listed in the
StanfordCoreNLP-french.properties
:
wget --header="Content-Type: text/plain; charset=UTF-8"
--post-data 'Bonjour le monde.'
'localhost:9000/?properties={
"annotators": "tokenize,ssplit,pos,parse",
"parse.model":"edu/stanford/nlp/models/lexparser/frenchFactored.ser.gz",
"pos.model":"edu/stanford/nlp/models/pos-tagger/french/french.tagger",
"tokenize.language":"fr",
"outputFormat": "json"}'
-O -
使用法国模型最终给出200条回应!
which finally gives a 200 response using the French models!
(注意:不知道如何使其与UI一起使用(与utf-8支持相同))
(NB: don't know how to make it work with the UI (same for utf-8 support))
这篇关于使用法语模型运行Stanford corenlp服务器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!