斯坦福 NER:我可以在我的代码中同时使用两个分类器吗? [英] Stanford NER: Can I use two classifiers at once in my code?

查看:17
本文介绍了斯坦福 NER:我可以在我的代码中同时使用两个分类器吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我的代码中,我从第一个分类器中获得了 Person 识别,对于我制作的第二个分类器,我添加了一些要识别或注释为 Organization 的词> 但它不注释Person.

In my code, I get the Person recognition from the first classifier, and for the second one which I made, I added some words to be recognized or annotated as Organization but it does not annotate Person.

我需要从他们两个中受益,我该怎么做?

I need to get the benefit from the two of them, how can I do that?

我正在使用 Netbeans,这是代码:

I'm using Netbeans, and this is the code:

String serializedClassifier = "classifiers/english.all.3class.distsim.crf.ser.gz";
String serializedClassifier2 = "/Users/ha/stanford-ner-2014-10-26/classifiers/dept-model.ser.gz";

if (args.length > 0) {
  serializedClassifier = args[0];
}

AbstractSequenceClassifier<CoreLabel> classifier = CRFClassifier.getClassifier(serializedClassifier);
AbstractSequenceClassifier<CoreLabel> classifier2 = CRFClassifier.getClassifier(serializedClassifier2);

  String fileContents = IOUtils.slurpFile("/Users/ha/NetBeansProjects/NERtry/src/nertry/input.txt");
  List<List<CoreLabel>> out = classifier.classify(fileContents);
  List<List<CoreLabel>> out2 = classifier2.classify(fileContents);

  for (List<CoreLabel> sentence : out) {
      System.out.print("
english.all.3class.distsim.crf.ser.gz: ");
    for (CoreLabel word : sentence) {
      System.out.print(word.word() + '/' + word.get(CoreAnnotations.AnswerAnnotation.class) + ' ');
    }

  for (List<CoreLabel> sentence2 : out2) {
      System.out.print("
dept-model.ser.gz");
    for (CoreLabel word2 : sentence2) {
      System.out.print(word2.word() + '/' + word2.get(CoreAnnotations.AnswerAnnotation.class) + ' ');
    }

    System.out.println();
  }
}

问题来自我得到的结果:

The problem comes from the result I get:

english.all.3class.distsim.crf.ser.gz: What/O date/O did/O James/PERSON started/O his/O job/O in/O Human/O and/O Finance/O ?/O 
dept-model.ser.gzWhat/O date/O did/O James/ORGANIZATION started/O his/O job/O in/O Human/ORGANIZATION and/O Finance/ORGANIZATION ?/O 

它将名称识别为来自第二个分类器的组织,我需要将其注释为 PERSON.有什么帮助吗?

where it recognize the names as organization from the second classifier, and I need it to be annotated as PERSON. Any help?

推荐答案

你应该使用的类是 NERClassifierCombiner.它的语义是它按照您指定的顺序从左到右运行分类器(可以在构造函数中为其指定任何数字),并且后面的分类器不能注释与早期分类器的实体标记重叠的实体,但可以自由添加注释.因此,在简单的偏好排名中,更早的分类器是首选.下面我给出一个完整的代码示例.

The class you should use to make this easy is NERClassifierCombiner. Its semantics is that it runs the classifiers in order from left to right as you specify them (any number can be given to it in the constructor), and that later classifiers cannot annotate an entity that overlaps with an entity tagging of an earlier classifier, but are otherwise free to add annotations. So, earlier classifiers are preferred in a simple preference ranking. I give a complete code example below.

(如果你自己训练所有的分类器,通常最好将所有实体一起训练,这样它们就可以在指定的类别中相互影响.但是这种简单的偏好排序通常效果很好,我们自己使用它.)

(If you are training all your own classifiers, it is generally best to train all the entities together, so they can influence each other in the categories assigned. But this simple preference ordering usually works pretty well, and we use it ourselves.)

import edu.stanford.nlp.ie.NERClassifierCombiner;
import edu.stanford.nlp.io.IOUtils;
import edu.stanford.nlp.ling.CoreLabel;

import java.io.IOException;
import java.util.List;

public class MultipleNERs {

  public static void main(String[] args) throws IOException {
    String serializedClassifier = "classifiers/english.all.3class.distsim.crf.ser.gz";
    String serializedClassifier2 = "classifiers/english.muc.7class.distsim.crf.ser.gz";

    if (args.length > 0) {
      serializedClassifier = args[0];
    }

    NERClassifierCombiner classifier = new NERClassifierCombiner(false, false, 
            serializedClassifier, serializedClassifier2);

    String fileContents = IOUtils.slurpFile("input.txt");
    List<List<CoreLabel>> out = classifier.classify(fileContents);

    int i = 0;
    for (List<CoreLabel> lcl : out) {
      i++;
      int j = 0;
      for (CoreLabel cl : lcl) {
        j++;
        System.out.printf("%d:%d: %s%n", i, j,
                cl.toShorterString("Text", "CharacterOffsetBegin", "CharacterOffsetEnd", "NamedEntityTag"));
      }
    }
  }

}

这篇关于斯坦福 NER:我可以在我的代码中同时使用两个分类器吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆