Stanford.NLP for .NET未加载模型 [英] Stanford.NLP for .NET not loading models

查看:88
本文介绍了Stanford.NLP for .NET未加载模型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试运行示例代码为斯坦福大学提供的代码.用于.NET的NLP .

I am trying to run the sample code provided here for Stanford.NLP for .NET.

我通过Nuget安装了该软件包,下载了CoreNLP zip归档文件,并提取了stanford-corenlp-3.7.0-models.jar.解压缩后,我将模型"目录放在stanford-corenlp-full-2016-10-31 \ edu \ stanford \ nlp \ models中.

I installed the package via Nuget, downloaded the CoreNLP zip archive, and extracted stanford-corenlp-3.7.0-models.jar. After extracting, I located the "models" directory in stanford-corenlp-full-2016-10-31\edu\stanford\nlp\models.

这是我要运行的代码:

 public static void Test1()
    {
        // Path to the folder with models extracted from `stanford-corenlp-3.6.0-models.jar`
        var jarRoot = @"..\..\..\stanford-corenlp-full-2016-10-31\edu\stanford\nlp\models\";

        // Text for processing
        var text = "Kosgi Santosh sent an email to Stanford University. He didn't get a reply.";

        // Annotation pipeline configuration
        var props = new Properties();
        props.setProperty("annotators", "tokenize, ssplit, pos, lemma, parse, ner,dcoref");
        props.setProperty("ner.useSUTime", "0");

        // We should change current directory, so StanfordCoreNLP could find all the model files automatically
        var curDir = Environment.CurrentDirectory;
        Directory.SetCurrentDirectory(jarRoot);
        var pipeline = new StanfordCoreNLP(props);
        Directory.SetCurrentDirectory(curDir);

        // Annotation
        var annotation = new Annotation(text);
        pipeline.annotate(annotation);

        // Result - Pretty Print
        using (var stream = new ByteArrayOutputStream())
        {
            pipeline.prettyPrint(annotation, new PrintWriter(stream));
            Console.WriteLine(stream.toString());
            stream.close();
        }
    }

运行代码时出现以下错误:

I get the following error when I run the code:

stanford-corenlp-3.6.0.dll中发生类型为'java.lang.RuntimeException'的第一次机会异常 stanford-corenlp-3.6.0.dll中发生了类型为'java.lang.RuntimeException'的未处理异常 附加信息:edu.stanford.nlp.io.RuntimeIOException:加载标记器模型时出错(可能缺少模型文件)

A first chance exception of type 'java.lang.RuntimeException' occurred in stanford-corenlp-3.6.0.dll An unhandled exception of type 'java.lang.RuntimeException' occurred in stanford-corenlp-3.6.0.dll Additional information: edu.stanford.nlp.io.RuntimeIOException: Error while loading a tagger model (probably missing model file)

我做错了什么?我真的很想让这个工作. :(

What am I doing wrong? I really want to get this working. :(

推荐答案

Mikael Kristensen的答案是正确的. stanfrod-corenlp-ful-*.zip归档文件包含内部带有模型的文件stanford-corenlp-3.7.0-models.jar(这是一个zip归档文件).在Java世界中,将此jar添加到类路径中,它将自动解析模型在存档中的位置.

Mikael Kristensen's answer is correct. stanfrod-corenlp-ful-*.zip archive contains files stanford-corenlp-3.7.0-models.jar with models inside (this is a zip archive). In Java world, you add this jar on the class path, and it automatically resolves models' location in the archive.

CoreNLP的文件为 DefaultPaths .java 指定模型文件的路径.因此,当使用未指定模型位置的Properties对象实例化StanfordCoreNLP时,应保证可以在默认路径(与Environment.CurrentDirectory相关)中找到模型.

CoreNLP has a file DefaultPaths.java that specifies path to model file. So when you instantiate StanfordCoreNLP with Properties object that does not specify models location, you should guarantee that models could be found by default path (related to Environment.CurrentDirectory).

保证像Environment.CurrentDirectory + "edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz"这样的路径上文件的存在的最简单方法是将jar存档解压缩到该文件夹​​,然后将当前目录临时更改为解压缩的文件夹.

The simplest way to guarantee existence of files at path like Environment.CurrentDirectory + "edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz" is to unzip a jar archive to the folder, and temporary change the current directory to unzipped folder.

var jarRoot = "nlp.stanford.edu/stanford-corenlp-full-2016-10-31/jar-modules/";
...
var curDir = Environment.CurrentDirectory;
Directory.SetCurrentDirectory(jarRoot);
var pipeline = new StanfordCoreNLP(props);
Directory.SetCurrentDirectory(curDir);

另一种方法是指定流水线需要的所有模型的路径(它实际上取决于annotators的列表). 此选项更加复杂,因为您必须找到正确的属性键,并指定所有使用的模型的路径.但是,如果要最小化部署程序包的大小,这可能会很有用.

The other way is to specify paths to all models that your pipeline need (it actually depends on the list of annotators). This option is more complicated because you have to find correct property keys, and specify paths to all used model. But it may be useful if you want to minimize the size of you deployment package.

var props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma, ner, depparse");
props.put("ner.model",
          "edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz");
props.put("ner.applyNumericClassifiers", "false");
var pipeline = new StanfordCoreNLP(props);

这篇关于Stanford.NLP for .NET未加载模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆