使用 MASI 距离的 NLTK 协议的低 alpha [英] Low alpha for NLTK agreement using MASI distance

查看:29
本文介绍了使用 MASI 距离的 NLTK 协议的低 alpha的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我使用 MASI 作为距离函数在 NLTK 中计算一致性时,我得到的 Krippendorff alpha 值非常低.

I'm getting a very low value for Krippendorff's alpha when I calculate agreement in NLTK using MASI as the distance function.

指示三位编码员(Inky、Blinky 和 ​​Sue)根据文本的内容为两个文本(text01 和 text02)分配主题标签(爱情、礼物、粘液或游戏).每个文本可以是多个主题,因此编码人员可以为每个文本分配多个标签.用于进行计算的数据和代码如下所示:

Three coders (Inky, Blinky, and Sue) are instructed to assign topic labels (love, gifts, slime, or gaming) to two texts (text01 and text02), based on what the texts are about. Each text can be about more than one topic, so coders may assign each text more than one label. The data and the code used to make the calculatons are shown below:

import nltk
from nltk.metrics import agreement
from nltk.metrics.distance import masi_distance
from nltk.metrics.distance import jaccard_distance

#(coder, item, label)
data = [('inky','text01',frozenset(['love','gifts'])), 
      ('blinky','text01',frozenset(['love','gifts'])), 
      ('sue','text01',frozenset(['love','gifts'])), 
      ('inky','text02',frozenset(['slime','gaming'])), 
      ('blinky','text02',frozenset(['slime'])), 
      ('sue','text02',frozenset(['slime','gaming']))]

jaccard_task = nltk.AnnotationTask(distance=jaccard_distance)
masi_task = nltk.AnnotationTask(distance=masi_distance)
tasks = [jaccard_task, masi_task]
for task in tasks:
    task.load_array(data)
    print("Statistics for dataset using {}".format(task.distance))
    print("C: {}\nI: {}\nK: {}".format(task.C, task.I, task.K))
    print("Pi: {}".format(task.pi()))
    print("Kappa: {}".format(task.kappa()))
    print("Multi-Kappa: {}".format(task.multi_kappa()))
    print("Alpha: {}".format(task.alpha()))
    print()

当我运行代码时,我得到以下结果:

When I run the code, I get the following results:

Statistics for dataset using <function jaccard_distance at 0x09D26DB0>
C: {'inky', 'sue', 'blinky'}
I: {'text01', 'text02'}
K: {frozenset({'slime'}), frozenset({'love', 'gifts'}), frozenset   ({'gaming', 'slime'})}
Pi: 0.7272727272727273
Kappa: 0.7777777777777777
Multi-Kappa: 0.7499999999999999
Alpha: 0.75

Statistics for dataset using <function masi_distance at 0x09D26DF8>
C: {'inky', 'sue', 'blinky'}
I: {'text01', 'text02'}
K: {frozenset({'slime'}), frozenset({'love', 'gifts'}), frozenset({'gaming', 'slime'})}
Pi: 0.8172727272727272
Kappa: 0.8511111111111113
Multi-Kappa: 0.8324999999999998
Alpha: -1.5

我的问题是,与 Jaccard 相比,为什么使用 MASI 距离函数时的 alpha 如此之低?

My question is, why is the alpha so low when using the MASI distance function compared to Jaccard?

推荐答案

在运行提供的代码时,我无法重现该错误并获得了 Krippendorff alpha 与 MASI 距离的正确值.我使用了 Python 3.5.2、NumPy 1.18.2、NLTK 3.4.5.因此,最可能的答案是需要更新 NLTK.

I was unable to reproduce the error and got the correct value of Krippendorff's alpha with MASI distance when running the provided code. I used Python 3.5.2, NumPy 1.18.2, NLTK 3.4.5. Thus, the most probable answer would be that one need to update NLTK.

这篇关于使用 MASI 距离的 NLTK 协议的低 alpha的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆