在openNLP中编写我们自己的模型 [英] Writing our own models in openNLP

查看:222
本文介绍了在openNLP中编写我们自己的模型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我在命令行中使用这样的查询

If i use a query like this in command line

./opennlp TokenNameFinder en-ner-person.bin "input.txt" "output.txt"

我会在output.txt中打印人名,但是我想编写自己的模型,以便我应该打印自己的实体.

I'll get person names printed in output.txt but I want to write own models such that i should print my own entities.

例如

  1. icm2500的风险值是什么.
  2. prd_234的交付将延迟到来.
  3. 沃森正在处理router_34.

如果我通过了这些行,它应该解析并提取product_entities. icm2500,prd_234,router_34 ...等都是产品(我们可以将这些信息保存在文件中,并且可以将其用作模型或openNLP的查找对象.)

If i pass these lines, it should parse and extract product_entities. icm2500, prd_234, router_34... etc these are all Products( we can save this information in a file and we can use it as look up kind of for models or openNLP).

有人可以给我打电话怎么做吗?

Can anyone please tel me how to do this ?

推荐答案

您需要通过注释一些opennlp格式的句子来训练自己的模型.对于您发布的例句,格式如下:

You'll need to train your own model by annotating some sentences in the opennlp format. For the example sentences you posted the format would look like this:

what is the risk value on <START:product> icm2500 <END>.
Delivery of <START:product> prd_234 <END> will be arrived late.
Watson is handling <START:product> router_34 <END>.

请确保每个句子都以换行符结尾,并且确保句子中是否有换行符以某种方式将其转义. 一旦用数据创建了这样的文件,就可以使用Java API训练这样的模型

Make sure each sentence ends in a newline and if there are newlines in the sentence to escape them somehow. Once you make a file like this out of your data, then you can use the Java API to train the model like this

public static void main(String[] args){

Charset charset = Charset.forName("UTF-8");
ObjectStream<String> lineStream =
        new PlainTextByLineStream(new FileInputStream("your file in the above format"), charset);
ObjectStream<NameSample> sampleStream = new NameSampleDataStream(lineStream);

TokenNameFinderModel model;

try {
  model = NameFinderME.train("en", "person", sampleStream, TrainingParameters.defaultParams(),
            null, Collections.<String, Object>emptyMap());
}
finally {
  sampleStream.close();
}

try {
  modelOut = new BufferedOutputStream(new FileOutputStream(modelFile));
  model.serialize(modelOut);
} finally {
  if (modelOut != null) 
     modelOut.close();      
}

}

现在您可以将模型与名称查找器一起使用.

now you can use the model with the namefinder.

因为您可能有一个确定的产品名称列表,而且清单可能很短,所以您可以考虑使用一种简单的正则表达式方法.

Because you may have a definitive, and possibly short, list of product names, you might consider a simple regex approach.

下面是一些涉及NameFinder的opennlp文档:

here's the opennlp docs that cover the NameFinder a bit:

http://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#tools.namefind.training.tool

这篇关于在openNLP中编写我们自己的模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆