f得分:ValueError:分类指标无法处理多标签指标和连续多输出目标的混合情况 [英] f-score: ValueError: Classification metrics can't handle a mix of multilabel-indicator and continuous-multioutput targets

查看:127
本文介绍了f得分:ValueError:分类指标无法处理多标签指标和连续多输出目标的混合情况的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试为模型进行的预测计算micro F度量.我使用带有Keras和Tensorflow的word2vec Vectors训练了模型.我使用scikit库来计算mirco F度量.

但是该函数抛出以下消息:

  ValueError:分类指标无法处理多标签指标和连续多输出目标的混合情况 

我也做正确的预测吗?我在 x_train(wordVectors) y_train(resultVectors)上训练了模型,并用 x_test y_test 进行了验证./p>

现在,我对 x_test 进行了预测,并希望使用 y_test 对该预测进行评估.到目前为止,我做得对吗?

预测数组如下所示:

  [[1.7533608e-02 5.8055294e + 01 2.2185498e-03 ... -1.2394511e-031.0454212e + 00 -1.6698670e-03][1.7539740e-02 5.8173992e + 01 2.1747553e-03 ... -1.2764656e-031.0475068e + 00 -1.6941782e-03][1.7591618e-02 5.8222389e + 01 2.2053251e-03 ... -1.2856000e-031.0484750e + 00 -1.6668942e-03] ... 

和真实值看起来像这样:

  [[0 0 0 ... 0 0 0][0 0 0 ... 0 0 0][0 0 0 ... 0 0 0] ... 

我已经尝试将两个数组都转换为二进制值(使用 np.argmax(...,axis = 1)).然后没有错误,我得到的微F测度大约为0.59 ...太高了,所以我认为我做错了.我的问题是是否还有另一种转换数据的方法?我可以将预测值转换为多标签指标值吗?

  model = load_model('model.h5')预测= model.predict(x_test)projection_binary = np.argmax(预测,轴= 1)y_test_binary = np.argmax(y_test,轴= 1)打印(f1_score(y_test_binary,prediction_binary,average ='micro')) 

我期望的输出是< 0.20,但是我得到的是0.59,这太好了.

解决方案

问题是,您只能在输出矢量的最高值和测试矢量的一个值所预测的标签上计算指标.

实际上,即使向量具有多个最小值,

实际上, np.argmax 也仅返回一个值.例如 np.argmax([0,0,1,0,1,1])仅返回2.

由于您的问题包含多标签分类问题,因此您希望将输入内容分为几类.为此,您必须将分类器的输出向量转换为测试向量的相同形状.

您可以按照以下步骤进行操作:

  prediction_int = np.zeroes_like(预测)projection_int [预测>0.5] = 1 

I am trying to compute the micro F measure for a prediction my model did. I trained the model using word2vec Vectors with Keras and Tensorflow. I use the scikit library to compute the mirco F measure.

But the function throws this message:

ValueError: Classification metrics can't handle a mix of multilabel-indicator and continuous-multioutput targets

Also, am I doing the prediction right? I trained the model on x_train(wordVectors) and y_train(resultVectors) and validated with x_test and y_test.

Now I did a prediction of x_test and want to evaluate the prediction using y_test. Am I doing it right so far?

The prediction array looks like this:

[[ 1.7533608e-02  5.8055294e+01  2.2185498e-03 ... -1.2394511e-03
   1.0454212e+00 -1.6698670e-03]
 [ 1.7539740e-02  5.8173992e+01  2.1747553e-03 ... -1.2764656e-03
   1.0475068e+00 -1.6941782e-03]
 [ 1.7591618e-02  5.8222389e+01  2.2053251e-03 ... -1.2856000e-03
   1.0484750e+00 -1.6668942e-03] ...

and the true values look like this:

[[0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]...

I already tried to convert both arrays into binary values (with np.argmax(..., axis=1)). Then there is no error and I get the micro F measure which is around 0,59... which is far too high and so I think I did a mistake. My question is if there is another way of converting the data? Can I convert the prediction to multilabel-indicator values?

model = load_model('model.h5')
prediction = model.predict(x_test)

prediction_binary = np.argmax(prediction, axis=1)
y_test_binary = np.argmax(y_test, axis=1)

print(f1_score(y_test_binary, prediction_binary, average='micro'))

I expect the output of <0.20 but instead, I get 0.59 which is a far too good value.

解决方案

The problem is that you compute your metric only on the label predicted by the highest value of your output vector with only one value of the test vector.

Indeed, np.argmax return only one value, even if the vector have several minimal values. for example np.argmax([0,0,1,0,1,1]) will return only 2.

As your problem consists of a multilabel classification problem, you want your input to be possibly classified in several categories. For that, you have to convert the output vectors of your classifier to the same shape of your test vectors.

You can do that as following :

prediction_int = np.zeroes_like(prediction)
prediction_int[prediction > 0.5] = 1

这篇关于f得分:ValueError:分类指标无法处理多标签指标和连续多输出目标的混合情况的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆