如何在Stanford NER中使用IOB标签? [英] How do I use IOB tags with Stanford NER?

查看:156
本文介绍了如何在Stanford NER中使用IOB标签?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

似乎有一些不同的设置:

There seem to be a few different settings:

iobtags
iobTags
entitySubclassification (IOB1 or IOB2?)
evaluateIOB

我使用哪个设置,如何正确使用它?

Which setting do I use, and how do I use it correctly?

我尝试过这样的标签:

1997    B-DATE
volvo   B-BRAND
wia64t  B-MODEL
highway B-TYPE
tractor I-TYPE

但是在训练输出上,似乎认为B-TYPE和I-TYPE是不同的类.

But on the training output, it seemed to think that B-TYPE and I-TYPE were different classes.

我正在使用2013-11-12版本.

I am using the 2013-11-12 release.

推荐答案

目前(2013年发行版)该怎么做有点混乱,因为两个不同的DocumentReaderAndWriter实现有两个不同的标志集.抱歉.

How this can be done is currently (2013 releases) a bit of a mess, since there are two different sets of flags for two different DocumentReaderAndWriter implementations. Sorry.

CoNLLDocumentReaderAndWriter中找到了对不同IOB样式的最灵活支持.当它读取带有标志的文件时,您可以将由示例(B-BRAND)之类的带连字符前缀完成的所有IOB/IOE/...批注映射到其他任何注解:

The most flexible support for different IOB styles is found in CoNLLDocumentReaderAndWriter. You can have it map any IOB/IOE/... annotation done by hyphenated prefixes like your examples (B-BRAND) to any other while it is reading files with the flag:

-entitySubclassification IOB2

然后将生成的标签集用于训练和分类.这些选项记录在CoNLLDocumentReaderAndWriterentitySubclassify()方法中:IOB1,IOB2,IOE1,IOE2,SBIEO,IO.您可以在 Tjong Kim Sang和Veenstra 1999 中找到有关IOB1与IOB2的讨论.默认情况下,该表示形式会在输出时映射回IOB1,因为这是CoNLL conlleval程序中使用的默认值,但是您可以使用标志将其保留为映射到的对象:

The resulting label set is then used for training and classification. The options are documented in the entitySubclassify() method of CoNLLDocumentReaderAndWriter: IOB1, IOB2, IOE1, IOE2, SBIEO, IO. You can find a discussion of IOB1 vs. IOB2 in Tjong Kim Sang and Veenstra 1999. By default the representation is mapped back to IOB1 on output, since that is the default used in the CoNLL conlleval program, but you can keep it as what you mapped it to with the flag:

-retainEntitySubclassification

要使用此DocumentReaderAndWriter,您可以发出如下训练命令:

To use this DocumentReaderAndWriter, you can give a training command like:

java8 -mx6g edu.stanford.nlp.ie.crf.CRFClassifier -prop conll.crf.chris2009.prop -readerAndWriter edu.stanford.nlp.sequences.CoNLLDocumentReaderAndWriter -entitySubclassification iob2

或者,ColumnDocumentReaderAndWriter是我们在分布式模型中使用的默认DocumentReaderAndWriter.您所获得的选项是不同的,并且有更多的限制.您具有以下两个标志:

Alternatively, ColumnDocumentReaderAndWriter is the default DocumentReaderAndWriter which we use in the distributed models. The options you get with it are different and slightly more limited. You have these two flags:

  • -mergeTags将采用纯标签("BRAND")或类似CoNLL的标签("I-BRAND"),并将它们向下映射到无前缀的IO标签("BRAND"),并将其用于训练和分类
  • -iobTags可以采用纯标签("BRAND")或类似CoNLL的标签("I-BRAND"),并将它们映射到IOB2.
  • -mergeTags will take either plain ("BRAND") or CoNLL-like ("I-BRAND") labels and map them down to a prefix-less IO label ("BRAND") and use that for training and classifying.
  • -iobTags can take either plain ("BRAND") or CoNLL-like ("I-BRAND") labels and maps them to IOB2.

在序列模型中,对于任何标记方案(如IOB2),标记都是不同的类.这就是这些标记方案的工作方式. "I-","B-"等的特殊解释留给人工观察者和实体级评估软件使用.随附的评估软件仅适用于IOB1,IOB2或无前缀IO编码.

In a sequence model, for any of the labeling schemes like IOB2, the labels are different classes. That is how these labeling schemes work. The special interpretation of "I-", "B-", etc. is left to the human observer and entity-level evaluation software. The included evaluation software will work with IOB1, IOB2, or prefixless IO encoding only.

这篇关于如何在Stanford NER中使用IOB标签?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆