FastText 召回是“nan",但精度是一个数字 [英] FastText recall is 'nan' but precision is a number

查看:335
本文介绍了FastText 召回是“nan",但精度是一个数字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用 Python 接口在 FastText 中训练了一个监督模型,并且在精度和召回率方面得到了奇怪的结果.

I trained a supervised model in FastText using the Python interface and I'm getting weird results for precision and recall.

首先,我训练了一个模型:

First, I trained a model:

model = fasttext.train_supervised("train.txt", wordNgrams=3, epoch=100, pretrainedVectors=pretrained_model)

然后我得到测试数据的结果:

Then I get results for the test data:

def print_results(N, p, r):
    print("N\t" + str(N))
    print("P@{}\t{:.3f}".format(1, p))
    print("R@{}\t{:.3f}".format(1, r))

print_results(*model.test('test.txt'))

但结果总是奇怪的,因为它们显示精度和召回率@1 是相同的,即使对于不同的数据集,例如一个输出是:

But the results are always odd, because they show precision and recall @1 as identical, even for different datasets, e.g. one output is:

N   46425
P@1 0.917
R@1 0.917

然后,当我寻找每个标签的准确率和召回率时,我总是将召回率设为nan":

Then when I look for the precision and recall for each label, I always get recall as 'nan':

print(model.test_label('test.txt'))

输出为:

{'__label__1': {'precision': 0.9202150724134941, 'recall': nan, 'f1score': 1.8404301448269882}, '__label__5': {'precision': 0.9134956983264135, 'recall': nan, 'f1score': 1.826991396652827}}

有人知道为什么会发生这种情况吗?

Does anyone know why this might be happening?

PS:要尝试此行为的可重现示例,请参阅 https://github.com/facebookresearch/fastText/issues/1072 并使用 FastText 0.9.2

P.S.: To try a reproducible example of this behavior, please refer to https://github.com/facebookresearch/fastText/issues/1072 and run it with FastText 0.9.2

推荐答案

看起来 FastText 0.9.2 在召回计算上有一个错误,应该用 此提交.

It looks like FastText 0.9.2 has a bug in the computation of recall, and that should be fixed with this commit.

安装 FastText 的前沿"版本,例如与

Installing a "bleeding edge" version of FastText e.g. with

pip install git+https://github.com/facebookresearch/fastText.git@b64e359d5485dda4b4b5074494155d18e25c8d13 --quiet

并重新运行您的代码应该可以消除召回计算中的 nan 值.

and rerunning your code should allow to get rid of the nan values in the recall computation.

这篇关于FastText 召回是“nan",但精度是一个数字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆