IOB精度和精度之间的差异 [英] Difference between IOB Accuracy and Precision

查看:169
本文介绍了IOB精度和精度之间的差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用命名实体识别和分块器在NLTK上做一些工作.我为此使用nltk/chunk/named_entity.py训练了一个分类器,并得到了以下保证:

ChunkParse score:
    IOB Accuracy:  96.5%
    Precision:     78.0%
    Recall:        91.9%
    F-Measure:     84.4%

但是在这种情况下,我不明白IOB准确度和精确度之间的确切区别是什么.实际上,我在文档(此处)上找到了以下内容:一个具体的例子:

IOB标签的准确性表明超过三分之一的单词是 用O标记,即不在NP块中.但是,由于我们的标记器确实 找不到任何块,它的精度,召回率和f度量都是 零.

那么,如果IOB精度只是O标签的数量,那么在该示例中,为什么我们没有块并且IOB精度不是100%在同一时间?

提前谢谢

解决方案

关于Wikipedia的准确性和准确性之间的区别,有非常详细的解释(请参阅 ChunkScore 用于计算系统的accuracyprecisionrecall.这是NLTK为accuracyprecision计算tp,fp,tn,fn的方式中最有趣的部分,它的粒度不同.

对于准确性,NLTK计算使用POS标签和IOB标签正确猜测的令牌总数( NOT CHUNKS!),然后除以总数黄金句中的令牌数量.

accuracy = num_tokens_correct / total_num_tokens_from_gold

对于精度召回,NLTK计算:

  • True Positives通过计算正确猜出的块数( NOT TOKENS !!! )
  • False Positives通过计算被猜测为错误的大块(没有令牌!! )的数量.
  • True Negatives通过计算系统未猜测到的块(没有令牌!)的数量.

然后像这样计算精度和召回率:

precision = tp / fp + tp
recall = tp / fn + tp

要证明以上几点,请尝试以下脚本:

from nltk.chunk import *
from nltk.chunk.util import *
from nltk.chunk.regexp import *
from nltk import Tree
from nltk.tag import pos_tag

# Let's say we give it a rule that says anything with a [DT NN] is an NP
chunk_rule = ChunkRule("<DT>?<NN.*>", "DT+NN* or NN* chunk")
chunk_parser = RegexpChunkParser([chunk_rule], chunk_node='NP')

# Let's say our test sentence is:
# "The cat sat on the mat the big dog chewed."
gold = tagstr2tree("[ The/DT cat/NN ] sat/VBD on/IN [ the/DT mat/NN ] [ the/DT big/JJ dog/NN ] chewed/VBD ./.")

# We POS tag the sentence and then chunk with our rule-based chunker.
test = pos_tag('The cat sat on the mat the big dog chewed .'.split())
chunked = chunk_parser.parse(test)

# Then we calculate the score.
chunkscore = ChunkScore()
chunkscore.score(gold, chunked)
chunkscore._updateMeasures()

# Our rule-based chunker says these are chunks.
chunkscore.guessed()

# Total number of tokens from test sentence. i.e.
# The/DT , cat/NN , on/IN , sat/VBD, the/DT , mat/NN , 
# the/DT , big/JJ , dog/NN , chewed/VBD , ./.
total = chunkscore._tags_total
# Number of tokens that are guessed correctly, i.e.
# The/DT , cat/NN , on/IN , the/DT , mat/NN , chewed/VBD , ./.
correct = chunkscore._tags_correct
print "Is correct/total == accuracy ?", chunkscore.accuracy() == (correct/total)
print correct, '/', total, '=', chunkscore.accuracy()
print "##############"

print "Correct chunk(s):" # i.e. True Positive.
correct_chunks = set(chunkscore.correct()).intersection(set(chunkscore.guessed()))
##print correct_chunks
print "Number of correct chunks = tp = ", len(correct_chunks)
assert len(correct_chunks) == chunkscore._tp_num
print

print "Missed chunk(s):" # i.e. False Negative.
##print chunkscore.missed()
print "Number of missed chunks = fn = ", len(chunkscore.missed())
assert len(chunkscore.missed()) == chunkscore._fn_num
print 

print "Wrongly guessed chunk(s):" # i.e. False positive.
wrong_chunks = set(chunkscore.guessed()).difference(set(chunkscore.correct()))
##print wrong_chunks
print "Number of wrong chunks = fp =", len(wrong_chunks)
print chunkscore._fp_num
assert len(wrong_chunks) == chunkscore._fp_num
print 

print "Recall = ", "tp/fn+tp =", len(correct_chunks), '/', len(correct_chunks)+len(chunkscore.missed()),'=', chunkscore.recall()

print "Precision =", "tp/fp+tp =", len(correct_chunks), '/', len(correct_chunks)+len(wrong_chunks), '=', chunkscore.precision()

I'm doing some works on NLTK with named entity recognition and chunkers. I retrained a classifier using nltk/chunk/named_entity.py for that and I got the following mesures:

ChunkParse score:
    IOB Accuracy:  96.5%
    Precision:     78.0%
    Recall:        91.9%
    F-Measure:     84.4%

But I don't understand what is the exact difference between IOB Accuracy and Precision in this case. Actually, I found on the docs (here) the following for an specific example:

The IOB tag accuracy indicates that more than a third of the words are tagged with O, i.e. not in an NP chunk. However, since our tagger did not find any chunks, its precision, recall, and f-measure are all zero.

So, if IOB accuracy is just the number of O labels, how come we don't have chunks and IOB accuracy is not 100% at the same time, in that example?

Thank you in advance

解决方案

There is a very detailed explanation of the difference between precision and accuracy on wikipedia (see https://en.wikipedia.org/wiki/Accuracy_and_precision), in brief:

accuracy = (tp + tn) / (tp + tn + fp + fn)
precision = tp / tp + fp

Back to NLTK, there is a module call ChunkScore that computes the accuracy, precision and recall of your system. And here's the funny part the way NLTK calculates the tp,fp,tn,fn for accuracy and precision, it does at different granularity.

For accuracy, NLTK calculates the total number of tokens (NOT CHUNKS!!) that are guessed correctly with the POS tags and IOB tags, then divided by the total number of tokens in the gold sentence.

accuracy = num_tokens_correct / total_num_tokens_from_gold

For precision and recall, NLTK calculates the:

  • True Positives by counting the number of chunks (NOT TOKENS!!!) that are guessed correctly
  • False Positives by counting the number of chunks (NOT TOKENS!!!) that are guessed but they are wrong.
  • True Negatives by counting the number of chunks (NOT TOKENS!!!) that are not guessed by the system.

And then calculates the precision and recall as such:

precision = tp / fp + tp
recall = tp / fn + tp

To prove the above points, try this script:

from nltk.chunk import *
from nltk.chunk.util import *
from nltk.chunk.regexp import *
from nltk import Tree
from nltk.tag import pos_tag

# Let's say we give it a rule that says anything with a [DT NN] is an NP
chunk_rule = ChunkRule("<DT>?<NN.*>", "DT+NN* or NN* chunk")
chunk_parser = RegexpChunkParser([chunk_rule], chunk_node='NP')

# Let's say our test sentence is:
# "The cat sat on the mat the big dog chewed."
gold = tagstr2tree("[ The/DT cat/NN ] sat/VBD on/IN [ the/DT mat/NN ] [ the/DT big/JJ dog/NN ] chewed/VBD ./.")

# We POS tag the sentence and then chunk with our rule-based chunker.
test = pos_tag('The cat sat on the mat the big dog chewed .'.split())
chunked = chunk_parser.parse(test)

# Then we calculate the score.
chunkscore = ChunkScore()
chunkscore.score(gold, chunked)
chunkscore._updateMeasures()

# Our rule-based chunker says these are chunks.
chunkscore.guessed()

# Total number of tokens from test sentence. i.e.
# The/DT , cat/NN , on/IN , sat/VBD, the/DT , mat/NN , 
# the/DT , big/JJ , dog/NN , chewed/VBD , ./.
total = chunkscore._tags_total
# Number of tokens that are guessed correctly, i.e.
# The/DT , cat/NN , on/IN , the/DT , mat/NN , chewed/VBD , ./.
correct = chunkscore._tags_correct
print "Is correct/total == accuracy ?", chunkscore.accuracy() == (correct/total)
print correct, '/', total, '=', chunkscore.accuracy()
print "##############"

print "Correct chunk(s):" # i.e. True Positive.
correct_chunks = set(chunkscore.correct()).intersection(set(chunkscore.guessed()))
##print correct_chunks
print "Number of correct chunks = tp = ", len(correct_chunks)
assert len(correct_chunks) == chunkscore._tp_num
print

print "Missed chunk(s):" # i.e. False Negative.
##print chunkscore.missed()
print "Number of missed chunks = fn = ", len(chunkscore.missed())
assert len(chunkscore.missed()) == chunkscore._fn_num
print 

print "Wrongly guessed chunk(s):" # i.e. False positive.
wrong_chunks = set(chunkscore.guessed()).difference(set(chunkscore.correct()))
##print wrong_chunks
print "Number of wrong chunks = fp =", len(wrong_chunks)
print chunkscore._fp_num
assert len(wrong_chunks) == chunkscore._fp_num
print 

print "Recall = ", "tp/fn+tp =", len(correct_chunks), '/', len(correct_chunks)+len(chunkscore.missed()),'=', chunkscore.recall()

print "Precision =", "tp/fp+tp =", len(correct_chunks), '/', len(correct_chunks)+len(wrong_chunks), '=', chunkscore.precision()

这篇关于IOB精度和精度之间的差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆