如何将序列化的CRFClassifier与StanfordCoreNLP prop'ner'一起使用 [英] How to use serialized CRFClassifier with StanfordCoreNLP prop 'ner'

查看:231
本文介绍了如何将序列化的CRFClassifier与StanfordCoreNLP prop'ner'一起使用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用StanfordCoreNLP API接口以编程方式执行一些基本的NLP.我需要在自己的语料库上训练模型,但是我想使用StanfordCoreNLP界面进行操作,因为它可以处理很多幕后干手,而且我不需要太多专业知识.

I'm using the StanfordCoreNLP API interface to programmatically do some basic NLP. I need to train a model on my own corpus, but I'd like to use the StanfordCoreNLP interface to do it, because it handles a lot of the dry mechanics behind the scenes and I don't need much specialization there.

我已经训练了一个我想用于NER的CRFClassifier,并将其序列化为一个文件.根据文档,我认为下面的方法会起作用,但是似乎找不到我的模型,而是对无法找到标准模型models之以鼻(我不确定为什么我没有那些模型文件,但我对此并不担心,因为我还是不想使用它们):

I've trained a CRFClassifier that I'd like to use for NER, serialized to a file. Based on the documentation, I'd think the following would work, but it doesn't seem to find my model and instead barfs on not being able to find the standard models (I'm not sure why I don't have those model files, but I'm not concerned about it since I don't want to use them anyway):

    // String constants
    final String serializedClassifierFilename = "/absolute/path/to/model.ser.gz";

    Properties props = new Properties();
    props.setProperty("annotators", "tokenize, ssplit, ner");
    props.setProperty("ner.models", serializedClassifierFilename);

    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

    String fileContents = IOUtils.slurpFileNoExceptions("test.txt");
    Annotation document = new Annotation(fileContents);

结果:

Adding annotator tokenize
TokenizerAnnotator: No tokenizer type provided. Defaulting to PTBTokenizer.
Adding annotator ssplit
Adding annotator ner
Loading classifier from /path/build/edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... java.io.FileNotFoundException: edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz (No such file or directory)
    at java.io.FileInputStream.open0(Native Method)
    at java.io.FileInputStream.open(FileInputStream.java:195)
    at java.io.FileInputStream.<init>(FileInputStream.java:138)
    at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1554)

等,等等.

我知道我没有他们的内置模型(同样,不确定为什么..我只是克隆了他们的git repo并使用ant compile进行了编译.无论如何,我还是不想使用他们的模型,我想用我训练过的那个.)

I know that I don't have their built-in model (again, not sure why.. I just cloned their git repo and compiled with ant compile. Regardless, I don't want to use their model anyway, I want to use the one I trained).

如何在ner步骤中获取StanfordCoreNLP接口以使用我的模型?有可能吗不可能吗?

How can I get the StanfordCoreNLP interface to use my model in the ner step? Is possible? Is not possible?

推荐答案

属性名称为ner.model,而不是ner.models,因此您的代码仍在尝试加载默认模型.

The property name is ner.model, not ner.models, so your code is still trying to load the default models.

让我知道这是否在某处记录有误.

Let me know if this is documented incorrectly somewhere.

这篇关于如何将序列化的CRFClassifier与StanfordCoreNLP prop'ner'一起使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆