有效地检查相邻的准确性(组成员身份?) [英] Efficiently check for adjacent accuracy (group membership?)
问题描述
我正在对6种级别的机器学习分类任务进行评估: A1,A2,B1,B2,C1和C2
。可以假定这些类别为序数,即可以对其进行排名。作为评估的一部分,我想衡量我的分类器对实际级别1内的文本进行分类的准确程度。我将其称为相邻精度。例如,如果文本实际排名为 B2
,则相邻的准确结果将为 B1
, B2
和 C1
。
I am doing evaluation on a machine-learning classification task with 6 levels: A1, A2, B1, B2, C1 and C2
. These categories can be assumed to be ordinal, i.e. they can be ranked. As part of my evaluation, I want to measure how accurately my classifier classified texts within 1 of the 'actual' level. I refer to this as 'adjacent accuracy'. For example, if a text is actually ranked B2
, then adjacently accurate results would be B1
, B2
and C1
.
我需要处理大量数据,因此我想要一种非常有效的方法来检查相邻的准确性。我在下面(python3)中包括了我的最佳方法,但我正在寻找任何建议以节省更多时间。
I have lots of data to go through, so I want to have a very efficient way to check for adjacent accuracy. I am included my best approach below (python3), but I am looking for any suggestions to squeeze more time out of it.
adjDict = {'A1':{'A1','A2'}, 'A2':{'A1','A2','B1'}, 'B1':{'A2','B1','B2'},
'B2':{'B1','B2','C1'}, 'C1':{'B2','C1','C2'}, 'C2':{'C1','C2'}}
def isAdjacent ( actual, classifierOutput ) :
return classifierOutput in adjDict[actual]
如果需要,可以将级别重新定义为数字(1-6),如果这样可以提高性能。
If necessary, the levels could be redefined to be numeric (1-6), if that could boost performance somehow.
有什么想法吗?
推荐答案
in
并不是特别快使用 str
,您可以使用简单的 int
并比较值:
in
is not very fast, especially with str
, you could use simple int
and compare values:
A1, A2, B1, B2, C1, C2 = range(6)
def isAdjacent(actual, classifierOutput):
return actual - 2 < classifierOutput < actual + 2
例如,如果您有 A2
文本,实际值为 0
,因此 isAdjacent
必须返回 True 如果
classifierOutput
在 0-2 = -2
和 0之间+ 2 = 2
,所以 0
或 1
。
For instance, if you have a A2
text, the actual value is 0
, so isAdjacent
must return True
if the classifierOutput
is between 0 - 2 = -2
and 0 + 2 = 2
, so either 0
or 1
.
这篇关于有效地检查相邻的准确性(组成员身份?)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!