训练命名实体的OpenNLP [英] Training Named Entity in OpenNLP
本文介绍了训练命名实体的OpenNLP的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我要训练语料库印度名称:
类NameTraining
{
公共静态无效TrainNames()抛出IOException异常
{
字符集字符集= Charset.forName(UTF-8);
的FileReader的FileReader =新的FileReader(train.txt);
ObjectStream FILESTREAM =新PlainTextByLineStream(的FileReader);
ObjectStream sampleStream =新NameSampleDataStream(FILESTREAM);
TokenNameFinderModel模型= NameFinderME.train(PT-BR,训练,sampleStream,集合<字符串对象> emptyMap());
NameFinderME NFM =新NameFinderME(模型);
} 公共静态无效的主要(字符串ARGS [])抛出IOException异常
{
NameTraining DET =新NameTraining();
det.TrainNames();
}
}
我编译这个使用命令:
的javac -cp $(回声的lib / *罐子| TR''':')NameTraining.java -Xlint:未选中
不过,我得到这些错误消息
NameTraining.java:35:警告:[选中]选中的转换
发现:opennlp.tools.util.ObjectStream
要求:opennlp.tools.util.ObjectStream<&java.lang.String中GT;
ObjectStream sampleStream =新NameSampleDataStream(FILESTREAM);
^
NameTraining.java:36:警告:[选中]选中的转换
发现:opennlp.tools.util.ObjectStream
要求:opennlp.tools.util.ObjectStream< opennlp.tools.namefind.NameSample>
TokenNameFinderModel模型= NameFinderME.train(PT-BR,训练,sampleStream,集合<字符串对象> emptyMap());
^
2警告
我想知道两件事情
- 是上面code正确的培训,如果是,那我怎么检查结果训练后?
- 什么是警告的意思吗?
解决方案
您好我有一个简短的成功训练数据集
公共静态无效TrainNames()抛出IOException异常
{
字符集字符集= Charset.forName(UTF-8);
ObjectStream<串GT; lineStream =新PlainTextByLineStream(新的FileInputStream(/家庭/ yogi.singh的/ dev / JAVA / NLP /数据/ EN-NER-person.train),字符集);
ObjectStream< NameSample> sampleStream =新NameSampleDataStream(lineStream);
//的FileReader的FileReader =新的FileReader(train.txt);
// ObjectStream FILESTREAM =新PlainTextByLineStream(的FileReader);
// ObjectStream sampleStream =新NameSampleDataStream(FILESTREAM);
TokenNameFinderModel模型= NameFinderME.train(EN,人,sampleStream,集合<字符串对象> emptyMap());
NameFinderME NFM =新NameFinderME(模型);
串句=;
BR的BufferedReader =新的BufferedReader(新的FileReader(/家庭/ yogi.singh的/ dev / JAVA / NLP / train.txt));
尝试
{
StringBuilder的SB =新的StringBuilder();
串行= br.readLine(); 而(行!= NULL)
{
sb.append(线);
sb.append('\\ n');
行= br.readLine();
}
一句话= sb.toString();
}
最后
{
br.close();
} InputStream的IS1 =新的FileInputStream(/家庭/ yogi.singh的/ dev / JAVA / NLP /数据/ EN-token.bin);
TokenizerModel MODEL1 =新TokenizerModel(IS1); 标记生成器标记生成器=新TokenizerME(MODEL1); 字符串标记[] = tokenizer.tokenize(句); 对于(字符串一:令牌)
的System.out.println(一); 跨度nameSpans [] = nfm.find(标记);
对于(跨度小号:nameSpans)
{
System.out.print(s.toString());
System.out.print();
对于(INT指数= s.getStart();指数< s.getEnd();指数++)
{
System.out.print(令牌[指数] +);
}
的System.out.println();
}
}
I want to train a corpus for Indian names:
class NameTraining
{
public static void TrainNames() throws IOException
{
Charset charset = Charset.forName("UTF-8");
FileReader fileReader = new FileReader("train.txt");
ObjectStream fileStream = new PlainTextByLineStream(fileReader);
ObjectStream sampleStream = new NameSampleDataStream(fileStream);
TokenNameFinderModel model = NameFinderME.train("pt-br", "train", sampleStream, Collections.<String, Object>emptyMap());
NameFinderME nfm = new NameFinderME(model);
}
public static void main(String args[]) throws IOException
{
NameTraining det = new NameTraining();
det.TrainNames();
}
}
I compile this using the command:
javac -cp $(echo lib/*.jar | tr ' ' ':') NameTraining.java -Xlint:unchecked
However I get these error messages
NameTraining.java:35: warning: [unchecked] unchecked conversion
found : opennlp.tools.util.ObjectStream
required: opennlp.tools.util.ObjectStream<java.lang.String>
ObjectStream sampleStream = new NameSampleDataStream(fileStream);
^
NameTraining.java:36: warning: [unchecked] unchecked conversion
found : opennlp.tools.util.ObjectStream
required: opennlp.tools.util.ObjectStream<opennlp.tools.namefind.NameSample>
TokenNameFinderModel model = NameFinderME.train("pt-br", "train", sampleStream, Collections.<String, Object>emptyMap());
^
2 warnings
I want to know two things
- Is the above code correct for training, and if yes, then how do I check the results after training?
- What do the warnings mean?
解决方案
Hi I got a brief successful training data set
public static void TrainNames() throws IOException
{
Charset charset = Charset.forName("UTF-8");
ObjectStream<String> lineStream =new PlainTextByLineStream(new FileInputStream("/home/yogi.singh/dev/java/nlp/data/en-ner-person.train"), charset);
ObjectStream<NameSample> sampleStream = new NameSampleDataStream(lineStream);
//FileReader fileReader = new FileReader("train.txt");
//ObjectStream fileStream = new PlainTextByLineStream(fileReader);
//ObjectStream sampleStream = new NameSampleDataStream(fileStream);
TokenNameFinderModel model = NameFinderME.train("en", "person", sampleStream, Collections.<String, Object>emptyMap());
NameFinderME nfm = new NameFinderME(model);
String sentence = "";
BufferedReader br = new BufferedReader(new FileReader("/home/yogi.singh/dev/java/nlp/train.txt"));
try
{
StringBuilder sb = new StringBuilder();
String line = br.readLine();
while (line != null)
{
sb.append(line);
sb.append('\n');
line = br.readLine();
}
sentence = sb.toString();
}
finally
{
br.close();
}
InputStream is1 = new FileInputStream("/home/yogi.singh/dev/java/nlp/data/en-token.bin");
TokenizerModel model1 = new TokenizerModel(is1);
Tokenizer tokenizer = new TokenizerME(model1);
String tokens[] = tokenizer.tokenize(sentence);
for (String a : tokens)
System.out.println(a);
Span nameSpans[] = nfm.find(tokens);
for(Span s: nameSpans)
{
System.out.print(s.toString());
System.out.print(" ");
for(int index = s.getStart();index < s.getEnd();index++)
{
System.out.print(tokens[index] + " ");
}
System.out.println(" ");
}
}
这篇关于训练命名实体的OpenNLP的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文