ConllReader(如RothCONLL04Reader)在使用自定义NER和自定义关系读取关系训练数据时引发异常 [英] ConllReader (Like RothCONLL04Reader) throws exception while reading relation training data with custom NER and custom relation

查看:87
本文介绍了ConllReader(如RothCONLL04Reader)在使用自定义NER和自定义关系读取关系训练数据时引发异常的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下问题的继续. 如何为斯坦福关系提取生成自定义培训数据

感谢StanfordNLPHelp,我能够使用自定义ner并在其正则表达式之上生成关系数据.

Thanks to StanfordNLPHelp i am able to generate relation data with custom ner and on top of it regexner.

I had to run my custom model at the end because otherwise it will misclassify lots of ORGANIZATION PERSON etc. 
Example custom NER classes. 

"DEGREE", "DESG"

关系训练数据示例.

0   ELECTEDBODY 0   O   NNP/IN/NNP  BOARD/OF/DIRECTORS  O   O   O
0   ORGANIZATION    1   O   NNP Board   O   O   O
0   O   2   O   NNS committees  O   O   O
0   O   3   O   JJ  key O   O   O
0   ORGANIZATION    4   O   NN/NN/NN/NN/NNP/NN  N/Nomination/committee/A/Audit/committee    O   O   O
0   O   5   O   NN  R   O   O   O
0   MISC    6   O   NN  Remuneration    O   O   O
0   O   7   O   NN  committee   O   O   O
0   O   8   O   NNP EFFECTIVE   O   O   O
0   O   9   O   NNP LEADERSHIP  O   O   O
0   O   10  O   CC  AND O   O   O
0   O   11  O   JJ  STRONG  O   O   O
0   O   12  O   NN  GOVERNANCE  O   O   O
0   O   13  O   NNP George  O   O   O
0   O   14  O   NNP Weston  O   O   O
0   DESG    15  O   NNP/NNP Chief/Executive O   O   O
0   O   16  O   -LRB-   -LRB-   O   O   O
0   O   17  O   NN  age O   O   O
0   NUMBER  18  O   CD  52  O   O   O
0   O   19  O   -RRB-   -RRB-   O   O   O
0   PERSON  20  O   NNP George  O   O   O
0   O   21  O   VBD was O   O   O
0   O   22  O   VBN appointed   O   O   O
0   O   23  O   TO  to  O   O   O
0   O   24  O   DT  the O   O   O
0   ELECTEDBODY 25  O   NN  board   O   O   O
0   DATE    26  O   IN/CD   in/1999 O   O   O
0   O   27  O   CC  and O   O   O
0   O   28  O   VBD took    O   O   O
0   O   29  O   RP  up  O   O   O
0   O   30  O   PRP$    his O   O   O
0   O   31  O   JJ  current O   O   O
0   O   32  O   NN  appointment O   O   O
0   O   33  O   IN  as  O   O   O
0   DESG    34  O   NNP/NNP Chief/Executive O   O   O
0   O   35  O   IN  in  O   O   O
0   DATE    36  O   NNP/CD  April/2005  O   O   O
0   O   37  O   .   .   O   O   O

20  34  cur_desg 
20  36  cur_desg_from

我正在尝试训练自定义关系模型并添加我的自定义关系类.

I am trying to train custom relation model and added my custom relation classes.

ex: relation class -> **cur_desg** (current designation) between entities (**PERSON, DESG**)
**Here is the relevant section of my properties file to train the relation classifier.**

datasetReaderClass = com.samrat.nlp.ie.re.CustomConllReader
entityClassifier = com.samrat.nlp.ie.re.CustomConllExtractor
relationResultsPrinters = com.samrat.nlp.ie.re.RelationResultPrinter

serializedTrainingSentencesPath = custom_relation_sentences.ser
serializedEntityExtractorPath = custom_relation_model.ser
serializedRelationExtractorPath = custom-relation-model-pipeline.ser

代码CustomConllReader的相关部分

Relevant section of Code CustomConllReader

private String getNormalizedNERTag(String ner) {
        ......
        }  else if(ner.equalsIgnoreCase("degree")) {
            return "DEGREE";
        }
        else if(ner.equalsIgnoreCase("electedbody")) {
            return "ELECTEDBODY";
        }
...............

问题1 (CustomConllReader在读取训练数据时在下一行引发异常)

Problem 1 (CustomConllReader throws exception at following line while reading training data)

Span span = new Span(entity1.getExtentTokenStart(), entity2.getExtentTokenEnd());

CustomConllReader的相关部分(与RothCONLL04Reader几乎相同)

Relevant portion of CustomConllReader (It is almost same as RothCONLL04Reader)

case 3: // relation
                System.out.println(currentLine);
                String type = pieces.get(2);
                List<ExtractionObject> args = new ArrayList<>();
                EntityMention entity1 = indexToEntityMention.get(pieces.get(0));
                EntityMention entity2 = indexToEntityMention.get(pieces.get(1));
                args.add(entity1);
                args.add(entity2);
                Span span = new Span(entity1.getExtentTokenStart(), entity2.getExtentTokenEnd());
                // identifier = "relation" + sentenceID + "-" + sentence.getAllRelations().size();
                identifier = RelationMention.makeUniqueId();
                RelationMention relationMention = new RelationMention(identifier,
                        sentence, span, type, null, args);
                AnnotationUtils.addRelationMention(sentence, relationMention);
                break;

例外

    INFO: Reading file: tagged-training-relation-data-conll04.corp
20  34  cur_desg 
20  36  cur_desg_from
0   2   cur_desg
Exception in thread "main" java.io.IOException
    at edu.stanford.nlp.ie.machinereading.GenericDataSetReader.parse(GenericDataSetReader.java:138)
    at com.wipro.nlp.ie.re.CustomConllReader.main(CustomConllReader.java:292)
Caused by: java.lang.NullPointerException
    at com.wipro.nlp.ie.re.CustomConllReader.readSentence(CustomConllReader.java:144)
    at com.wipro.nlp.ie.re.CustomConllReader.read(CustomConllReader.java:55)
    at edu.stanford.nlp.ie.machinereading.GenericDataSetReader.parse(GenericDataSetReader.java:136)
    ... 1 more

解析关系(0 2 cur_desg)时在句子3上引发的异常

The exception thrown on sentence 3 while parsing the relation (0 2 cur_desg)

3   PERSON  0   O   NNP/NNP John/Bason  O   O   O
3   O   1   O   NNP Finance O   O   O
3   ELECTEDBODY 2   O   NNP Director    O   O   O
3   O   3   O   -LRB-   -LRB-   O   O   O
3   O   4   O   NN  age O   O   O
3   NUMBER  5   O   CD  59  O   O   O
3   O   6   O   -RRB-   -RRB-   O   O   O
3   PERSON  7   O   NNP John    O   O   O
3   O   8   O   VBD was O   O   O
3   O   9   O   VBN appointed   O   O   O
3   O   10  O   IN  as  O   O   O
3   O   11  O   NNP Finance O   O   O
3   ELECTEDBODY 12  O   NNP Director    O   O   O
3   O   13  O   IN  in  O   O   O
3   DATE    14  O   NNP/CD  May/1999    O   O   O
3   O   15  O   .   .   O   O   O

0   2   cur_desg
0   14  cur_desg_from

此问题已解决,我的训练数据之间有额外的换行符,可以建立自定义关系分类器. 但是现在,在使用该自定义关系分类器时,它不了解任何自定义NER标签或自定义关系.

This problem is solved, my training data has extra line break in between i am able to build a custom relation classifier. But now while using that custom relation classifier it does not understand any custom NER tags or custom relations.

下面是单独的问题. (用于使自定义关系分类器了解新句子中的自定义ner标签和关系) 自定义关系分类器无法理解任何自定义NER标签,但没有发现任何关系

Separate question here below. (for making custom relation classifier understand custom ner tags and relations in new sentences) Custom Relation Classifier does not understand any Custom NER tags and does not find any relations

推荐答案

由于之间存在额外的换行符而引发了异常. 如下所示,在输入的带标签的训练数据中必须恰好有两个换行符.

The exception was thrown due to extra line break in between. There has to be exactly two line breaks in the input tagged training data like below.

PERSON  0   O   NNP/NNP John/Bason  O   O   O
3   O   1   O   NNP Finance O   O   O
3   ELECTEDBODY 2   O   NNP Director    O   O   O
3   O   3   O   -LRB-   -LRB-   O   O   O
3   O   4   O   NN  age O   O   O
3   NUMBER  5   O   CD  59  O   O   O
3   O   6   O   -RRB-   -RRB-   O   O   O
3   PERSON  7   O   NNP John    O   O   O
3   O   8   O   VBD was O   O   O
3   O   9   O   VBN appointed   O   O   O
3   O   10  O   IN  as  O   O   O
3   O   11  O   NNP Finance O   O   O
3   ELECTEDBODY 12  O   NNP Director    O   O   O
3   O   13  O   IN  in  O   O   O
3   DATE    14  O   NNP/CD  May/1999    O   O   O
3   O   15  O   .   .   O   O   O

0   2   cur_desg
0   14  cur_desg_from

5   O   0   O   PRP He  O   O   O
5   O   1   O   VBD was O   O   O
5   O   2   O   RB  previously  O   O   O
5   O   3   O   DT  the O   O   O
5   O   4   O   NN  finance O   O   O
5   DESG    5   O   NN  director    O   O   O
5   O   6   O   IN  of  O   O   O
5   ORGANIZATION    7   O   NNP Bunzl   O   O   O
5   O   8   O   NN  plc O   O   O
5   O   9   O   CC  and O   O   O
5   O   10  O   VBZ is  O   O   O

这篇关于ConllReader(如RothCONLL04Reader)在使用自定义NER和自定义关系读取关系训练数据时引发异常的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆