ConllReader(如RothCONLL04Reader)在使用自定义NER和自定义关系读取关系训练数据时引发异常 [英] ConllReader (Like RothCONLL04Reader) throws exception while reading relation training data with custom NER and custom relation
问题描述
以下问题的继续. 如何为斯坦福关系提取生成自定义培训数据
感谢StanfordNLPHelp,我能够使用自定义ner并在其正则表达式之上生成关系数据.
Thanks to StanfordNLPHelp i am able to generate relation data with custom ner and on top of it regexner.
I had to run my custom model at the end because otherwise it will misclassify lots of ORGANIZATION PERSON etc.
Example custom NER classes.
"DEGREE", "DESG"
关系训练数据示例.
0 ELECTEDBODY 0 O NNP/IN/NNP BOARD/OF/DIRECTORS O O O
0 ORGANIZATION 1 O NNP Board O O O
0 O 2 O NNS committees O O O
0 O 3 O JJ key O O O
0 ORGANIZATION 4 O NN/NN/NN/NN/NNP/NN N/Nomination/committee/A/Audit/committee O O O
0 O 5 O NN R O O O
0 MISC 6 O NN Remuneration O O O
0 O 7 O NN committee O O O
0 O 8 O NNP EFFECTIVE O O O
0 O 9 O NNP LEADERSHIP O O O
0 O 10 O CC AND O O O
0 O 11 O JJ STRONG O O O
0 O 12 O NN GOVERNANCE O O O
0 O 13 O NNP George O O O
0 O 14 O NNP Weston O O O
0 DESG 15 O NNP/NNP Chief/Executive O O O
0 O 16 O -LRB- -LRB- O O O
0 O 17 O NN age O O O
0 NUMBER 18 O CD 52 O O O
0 O 19 O -RRB- -RRB- O O O
0 PERSON 20 O NNP George O O O
0 O 21 O VBD was O O O
0 O 22 O VBN appointed O O O
0 O 23 O TO to O O O
0 O 24 O DT the O O O
0 ELECTEDBODY 25 O NN board O O O
0 DATE 26 O IN/CD in/1999 O O O
0 O 27 O CC and O O O
0 O 28 O VBD took O O O
0 O 29 O RP up O O O
0 O 30 O PRP$ his O O O
0 O 31 O JJ current O O O
0 O 32 O NN appointment O O O
0 O 33 O IN as O O O
0 DESG 34 O NNP/NNP Chief/Executive O O O
0 O 35 O IN in O O O
0 DATE 36 O NNP/CD April/2005 O O O
0 O 37 O . . O O O
20 34 cur_desg
20 36 cur_desg_from
我正在尝试训练自定义关系模型并添加我的自定义关系类.
I am trying to train custom relation model and added my custom relation classes.
ex: relation class -> **cur_desg** (current designation) between entities (**PERSON, DESG**)
**Here is the relevant section of my properties file to train the relation classifier.**
datasetReaderClass = com.samrat.nlp.ie.re.CustomConllReader
entityClassifier = com.samrat.nlp.ie.re.CustomConllExtractor
relationResultsPrinters = com.samrat.nlp.ie.re.RelationResultPrinter
serializedTrainingSentencesPath = custom_relation_sentences.ser
serializedEntityExtractorPath = custom_relation_model.ser
serializedRelationExtractorPath = custom-relation-model-pipeline.ser
代码CustomConllReader的相关部分
Relevant section of Code CustomConllReader
private String getNormalizedNERTag(String ner) {
......
} else if(ner.equalsIgnoreCase("degree")) {
return "DEGREE";
}
else if(ner.equalsIgnoreCase("electedbody")) {
return "ELECTEDBODY";
}
...............
问题1 (CustomConllReader在读取训练数据时在下一行引发异常)
Problem 1 (CustomConllReader throws exception at following line while reading training data)
Span span = new Span(entity1.getExtentTokenStart(), entity2.getExtentTokenEnd());
CustomConllReader的相关部分(与RothCONLL04Reader几乎相同)
Relevant portion of CustomConllReader (It is almost same as RothCONLL04Reader)
case 3: // relation
System.out.println(currentLine);
String type = pieces.get(2);
List<ExtractionObject> args = new ArrayList<>();
EntityMention entity1 = indexToEntityMention.get(pieces.get(0));
EntityMention entity2 = indexToEntityMention.get(pieces.get(1));
args.add(entity1);
args.add(entity2);
Span span = new Span(entity1.getExtentTokenStart(), entity2.getExtentTokenEnd());
// identifier = "relation" + sentenceID + "-" + sentence.getAllRelations().size();
identifier = RelationMention.makeUniqueId();
RelationMention relationMention = new RelationMention(identifier,
sentence, span, type, null, args);
AnnotationUtils.addRelationMention(sentence, relationMention);
break;
例外
INFO: Reading file: tagged-training-relation-data-conll04.corp
20 34 cur_desg
20 36 cur_desg_from
0 2 cur_desg
Exception in thread "main" java.io.IOException
at edu.stanford.nlp.ie.machinereading.GenericDataSetReader.parse(GenericDataSetReader.java:138)
at com.wipro.nlp.ie.re.CustomConllReader.main(CustomConllReader.java:292)
Caused by: java.lang.NullPointerException
at com.wipro.nlp.ie.re.CustomConllReader.readSentence(CustomConllReader.java:144)
at com.wipro.nlp.ie.re.CustomConllReader.read(CustomConllReader.java:55)
at edu.stanford.nlp.ie.machinereading.GenericDataSetReader.parse(GenericDataSetReader.java:136)
... 1 more
解析关系(0 2 cur_desg)时在句子3上引发的异常
The exception thrown on sentence 3 while parsing the relation (0 2 cur_desg)
3 PERSON 0 O NNP/NNP John/Bason O O O
3 O 1 O NNP Finance O O O
3 ELECTEDBODY 2 O NNP Director O O O
3 O 3 O -LRB- -LRB- O O O
3 O 4 O NN age O O O
3 NUMBER 5 O CD 59 O O O
3 O 6 O -RRB- -RRB- O O O
3 PERSON 7 O NNP John O O O
3 O 8 O VBD was O O O
3 O 9 O VBN appointed O O O
3 O 10 O IN as O O O
3 O 11 O NNP Finance O O O
3 ELECTEDBODY 12 O NNP Director O O O
3 O 13 O IN in O O O
3 DATE 14 O NNP/CD May/1999 O O O
3 O 15 O . . O O O
0 2 cur_desg
0 14 cur_desg_from
此问题已解决,我的训练数据之间有额外的换行符,可以建立自定义关系分类器. 但是现在,在使用该自定义关系分类器时,它不了解任何自定义NER标签或自定义关系.
This problem is solved, my training data has extra line break in between i am able to build a custom relation classifier. But now while using that custom relation classifier it does not understand any custom NER tags or custom relations.
下面是单独的问题. (用于使自定义关系分类器了解新句子中的自定义ner标签和关系) 自定义关系分类器无法理解任何自定义NER标签,但没有发现任何关系
Separate question here below. (for making custom relation classifier understand custom ner tags and relations in new sentences) Custom Relation Classifier does not understand any Custom NER tags and does not find any relations
推荐答案
由于之间存在额外的换行符而引发了异常. 如下所示,在输入的带标签的训练数据中必须恰好有两个换行符.
The exception was thrown due to extra line break in between. There has to be exactly two line breaks in the input tagged training data like below.
PERSON 0 O NNP/NNP John/Bason O O O
3 O 1 O NNP Finance O O O
3 ELECTEDBODY 2 O NNP Director O O O
3 O 3 O -LRB- -LRB- O O O
3 O 4 O NN age O O O
3 NUMBER 5 O CD 59 O O O
3 O 6 O -RRB- -RRB- O O O
3 PERSON 7 O NNP John O O O
3 O 8 O VBD was O O O
3 O 9 O VBN appointed O O O
3 O 10 O IN as O O O
3 O 11 O NNP Finance O O O
3 ELECTEDBODY 12 O NNP Director O O O
3 O 13 O IN in O O O
3 DATE 14 O NNP/CD May/1999 O O O
3 O 15 O . . O O O
0 2 cur_desg
0 14 cur_desg_from
5 O 0 O PRP He O O O
5 O 1 O VBD was O O O
5 O 2 O RB previously O O O
5 O 3 O DT the O O O
5 O 4 O NN finance O O O
5 DESG 5 O NN director O O O
5 O 6 O IN of O O O
5 ORGANIZATION 7 O NNP Bunzl O O O
5 O 8 O NN plc O O O
5 O 9 O CC and O O O
5 O 10 O VBZ is O O O
这篇关于ConllReader(如RothCONLL04Reader)在使用自定义NER和自定义关系读取关系训练数据时引发异常的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!