使用命名实体训练模型 [英] Train model using Named entity

查看:160
本文介绍了使用命名实体训练模型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用命名实体识别器来查看standford corenlp.我有不同种类的输入文本,需要将其标记到自己的实体中.因此,我开始训练自己的模型,但似乎无法正常工作.

I am looking on standford corenlp using the Named Entity REcognizer.I have different kinds of input text and i need to tag it into my own Entity.So i started training my own model and it doesnt seems to be working.

例如:我的输入文本字符串是有关丰田陆地巡洋舰1956-1987年黄金投资组合的49条杂志文章的书, http://t.co/EqxmY1VmLg http://t.co/F0Vefuoj9Q "

For eg: my input text string is "Book of 49 Magazine Articles on Toyota Land Cruiser 1956-1987 Gold Portfolio http://t.co/EqxmY1VmLg http://t.co/F0Vefuoj9Q"

我将通过示例来训练自己的模型,并仅查找我感兴趣的一些单词.

I go through the examples to train my own models and and look for only some words that I am interested in.

我的jane-austen-emma-ch1.tsv看起来像这样

My jane-austen-emma-ch1.tsv looks like this

Toyota  PERS
Land Cruiser    PERS

在上面的输入文本中,我仅对这两个单词感兴趣.一个是 丰田(Toyota)和另一个词是Land Cruiser.

From the above input text i am only interested in those two words. The one is Toyota and the other word is Land Cruiser.

austin.prop看起来像这样

The austin.prop look like this

trainFile = jane-austen-emma-ch1.tsv
serializeTo = ner-model.ser.gz
map = word=0,answer=1
useClassFeature=true
useWord=true
useNGrams=true
noMidNGrams=true
useDisjunctive=true
maxNGramLeng=6
usePrev=true
useNext=true
useSequences=true
usePrevSequences=true
maxLeft=1
useTypeSeqs=true
useTypeSeqs2=true
useTypeySequences=true
wordShape=chris2useLC

运行以下命令以生成ner-model.ser.gz文件

Run the following command to generate the ner-model.ser.gz file

java -cp stanford-corenlp-3.4.1.jar edu.stanford.nlp.ie.crf.CRFClassifier -prop austen.prop

java -cp stanford-corenlp-3.4.1.jar edu.stanford.nlp.ie.crf.CRFClassifier -prop austen.prop

public static void main(String[] args) {
        String serializedClassifier = "edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz";
        String serializedClassifier2 = "C:/standford-ner/ner-model.ser.gz";
        try {
            NERClassifierCombiner classifier = new NERClassifierCombiner(false, false, 
                    serializedClassifier2,serializedClassifier);
            String ss = "Book of 49 Magazine Articles on Toyota Land Cruiser 1956-1987 Gold Portfolio http://t.co/EqxmY1VmLg http://t.co/F0Vefuoj9Q";
            System.out.println("---");
            List<List<CoreLabel>> out = classifier.classify(ss);
            for (List<CoreLabel> sentence : out) {
              for (CoreLabel word : sentence) {
                System.out.print(word.word() + '/' + word.get(AnswerAnnotation.class) + ' ');
              }
              System.out.println();
            }

        } catch (ClassCastException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }  catch (Exception e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

    }

这是我得到的输出

Book/PERS of/PERS 49/O Magazine/PERS Articles/PERS on/O Toyota/PERS Land/PERS Cruiser/PERS 1956-1987/PERS Gold/O Portfolio/PERS http://t.co/EqxmY1VmLg/PERS http://t.co/F0Vefuoj9Q/PERS

我认为这是错误的.我正在寻找Toyota/PERS和Land Cruiser/PERS(这是一个多价值的领域.

which i think its wrong.I am looking for Toyota/PERS and Land Cruiser/PERS(Which is a multi valued fied.

感谢您的帮助.非常感谢您提供帮助.

Thanks for the Help.Any help is really appreciated.

推荐答案

NERClassifier *是单词级别的,也就是说,它标记单词而不是短语.鉴于此,分类器的表现似乎不错.如果需要,可以将构成短语的单词连字符.因此,在带有标签的示例和测试示例中,您需要将"Land Cruiser"改为"Land_Cruiser".

The NERClassifier* is word level, that is, it labels words, not phrases. Given that, the classifier seems to be performing fine. If you want, you can hyphenate words that form phrases. So in your labeled examples and in your test examples, you would make "Land Cruiser" to "Land_Cruiser".

这篇关于使用命名实体训练模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆