参数数量必须始终为偶数：opennlp [英] Number of parameters must be always be even : opennlp

查看：180 发布时间：2018/12/20 0:10:25 java command-line-interface opennlp training-data

本文介绍了参数数量必须始终为偶数：opennlp的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我一直在尝试使用命令行界面来训练我的模型，如下所示：

I've been trying to use the command Line interface to train my model like this:

opennlp TokenNameFinderTrainer -model en-ner-pincode.bin -iterations 500 \ -lang en -data en-ner-pincode.train -encoding UTF-8

控制台输出为：

Number of parameters must be always be even
Usage: opennlp TokenNameFinderTrainer[.evalita|.ad|.conll03|.bionlp2004|.conll02|.muc6|.ontonotes|.brat] [-factory factoryName] [-resources resourcesDir] [-type modelType] [-featuregen featuregenFile] [-nameTypes types] [-sequenceCodec codec] [-params paramsFile] -lang language -model modelFile -data sampleData [-encoding charsetName]

如果我不包括迭代次数，它可以正常工作。
有谁知道这背后的原因？

It works fine if I don't include the number of Iterations. Does anybody know the reason behind this?

谢谢！

更新：

所以，请使用 ChunkerTrainerME 而不是 TokenNameFinderTrainer

您的命令应如下所示

opennlp ChunkerTrainerME -model en-ner-pincode.bin -iterations 500 \ -lang en -data en-ner-pincode.train -encoding UTF-8

UPDATE2：转换数据

我将使用西班牙语数据作为参考，但它与荷兰语的操作相同。您必须记住将-lang es更改为-lang nl并使用正确的培训文件。所以要将信息转换为OpenNLP格式：

UPDATE2: Converting the data

I will use Spanish data as reference, but it would be the same operations to Dutch. You just must remember change "-lang es" to "-lang nl" and use the correct training files. So to convert the information to the OpenNLP format:

$ opennlp TokenNameFinderConverter conll02 -data esp.train -lang es -types per > es_corpus_train_persons.txt

您也可以选择转换训练测试样本。

Optionally, you can convert the training test samples as well.

$ opennlp TokenNameFinderConverter conll02 -data esp.testa -lang es -types per > corpus_testa.txt
$ opennlp TokenNameFinderConverter conll02 -data esp.testb -lang es -types per > corpus_testb.txt

使用西班牙语数据进行培训

Training with Spanish data

为名称查找器训练模型：

To train the model for the name finder:

\bin\opennlp TokenNameFinderTrainer -lang es -encoding u
tf8 -iterations 500 -data es_corpus_train_persons.txt -model es_ner_person.bin

UPDATE3：转换数据（可选）

要将信息转换为OpenNLP格式：

UPDATE3: Converting the data (optional)

To convert the information to the OpenNLP format:

$ opennlp TokenNameFinderConverter conll03 -lang en -types per -data eng.train > corpus_train.txt

您也可以选择转换训练测试样本。

Optionally, you can convert the training test samples as well.

$ opennlp TokenNameFinderConverter conll03 -lang en -types per -data eng.testa > corpus_testa.txt
$ opennlp TokenNameFinderConverter conll03 -lang en -types per -data eng.testb > corpus_testb.txt

使用英语数据进行培训

您可以通过以下方式训练名称查找器的模型：

You can train the model for the name finder this way:

$ opennlp TokenNameFinderTrainer.conll03 -model en_ner_person.bin -iterations 500 \
                                 -lang en -types per -data eng.train -encoding utf8

如果您已经转换了数据，然后您可以通过这种方式训练名称查找器的模型：

If you have converted the data, then you can train the model for the name finder this way:

$ opennlp TokenNameFinderTrainer -model en_ner_person.bin -iterations 500 \
                                 -lang en -data corpus_train.txt -encoding utf8

这篇关于参数数量必须始终为偶数：opennlp的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

参数数量必须始终为偶数：opennlp [英] Number of parameters must be always be even : opennlp

问题描述

推荐答案

更新：

UPDATE2：转换数据

UPDATE2: Converting the data

UPDATE3：转换数据（可选）

UPDATE3: Converting the data (optional)

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

参数数量必须始终为偶数：opennlp [英] Number of parameters must be always be even : opennlp

问题描述

推荐答案

更新：

UPDATE2：转换数据

UPDATE2: Converting the data

UPDATE3：转换数据（可选）

UPDATE3: Converting the data (optional)

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭