f得分:ValueError:分类指标无法处理多标签指标和连续多输出目标的混合情况 [英] f-score: ValueError: Classification metrics can't handle a mix of multilabel-indicator and continuous-multioutput targets
问题描述
我正在尝试为模型进行的预测计算micro F度量.我使用带有Keras和Tensorflow的word2vec Vectors训练了模型.我使用scikit库来计算mirco F度量.
但是该函数抛出以下消息:
ValueError:分类指标无法处理多标签指标和连续多输出目标的混合情况
我也做正确的预测吗?我在 x_train(wordVectors)
和 y_train(resultVectors)
上训练了模型,并用 x_test
和 y_test
进行了验证./p>
现在,我对 x_test
进行了预测,并希望使用 y_test
对该预测进行评估.到目前为止,我做得对吗?
预测数组如下所示:
[[1.7533608e-02 5.8055294e + 01 2.2185498e-03 ... -1.2394511e-031.0454212e + 00 -1.6698670e-03][1.7539740e-02 5.8173992e + 01 2.1747553e-03 ... -1.2764656e-031.0475068e + 00 -1.6941782e-03][1.7591618e-02 5.8222389e + 01 2.2053251e-03 ... -1.2856000e-031.0484750e + 00 -1.6668942e-03] ...
和真实值看起来像这样:
[[0 0 0 ... 0 0 0][0 0 0 ... 0 0 0][0 0 0 ... 0 0 0] ...
我已经尝试将两个数组都转换为二进制值(使用 np.argmax(...,axis = 1)
).然后没有错误,我得到的微F测度大约为0.59 ...太高了,所以我认为我做错了.我的问题是是否还有另一种转换数据的方法?我可以将预测值转换为多标签指标值吗?
model = load_model('model.h5')预测= model.predict(x_test)projection_binary = np.argmax(预测,轴= 1)y_test_binary = np.argmax(y_test,轴= 1)打印(f1_score(y_test_binary,prediction_binary,average ='micro'))
我期望的输出是< 0.20,但是我得到的是0.59,这太好了.
问题是,您只能在输出矢量的最高值和测试矢量的一个值所预测的标签上计算指标.
实际上,即使向量具有多个最小值,实际上, np.argmax
也仅返回一个值.例如 np.argmax([0,0,1,0,1,1])
仅返回2.
由于您的问题包含多标签分类问题,因此您希望将输入内容分为几类.为此,您必须将分类器的输出向量转换为测试向量的相同形状.
您可以按照以下步骤进行操作:
prediction_int = np.zeroes_like(预测)projection_int [预测>0.5] = 1
I am trying to compute the micro F measure for a prediction my model did. I trained the model using word2vec Vectors with Keras and Tensorflow. I use the scikit library to compute the mirco F measure.
But the function throws this message:
ValueError: Classification metrics can't handle a mix of multilabel-indicator and continuous-multioutput targets
Also, am I doing the prediction right? I trained the model on x_train(wordVectors)
and y_train(resultVectors)
and validated with x_test
and y_test
.
Now I did a prediction of x_test
and want to evaluate the prediction using y_test
. Am I doing it right so far?
The prediction array looks like this:
[[ 1.7533608e-02 5.8055294e+01 2.2185498e-03 ... -1.2394511e-03
1.0454212e+00 -1.6698670e-03]
[ 1.7539740e-02 5.8173992e+01 2.1747553e-03 ... -1.2764656e-03
1.0475068e+00 -1.6941782e-03]
[ 1.7591618e-02 5.8222389e+01 2.2053251e-03 ... -1.2856000e-03
1.0484750e+00 -1.6668942e-03] ...
and the true values look like this:
[[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]...
I already tried to convert both arrays into binary values (with np.argmax(..., axis=1)
). Then there is no error and I get the micro F measure which is around 0,59... which is far too high and so I think I did a mistake.
My question is if there is another way of converting the data? Can I convert the prediction to multilabel-indicator values?
model = load_model('model.h5')
prediction = model.predict(x_test)
prediction_binary = np.argmax(prediction, axis=1)
y_test_binary = np.argmax(y_test, axis=1)
print(f1_score(y_test_binary, prediction_binary, average='micro'))
I expect the output of <0.20 but instead, I get 0.59 which is a far too good value.
The problem is that you compute your metric only on the label predicted by the highest value of your output vector with only one value of the test vector.
Indeed, np.argmax
return only one value, even if the vector have several minimal values.
for example np.argmax([0,0,1,0,1,1])
will return only 2.
As your problem consists of a multilabel classification problem, you want your input to be possibly classified in several categories. For that, you have to convert the output vectors of your classifier to the same shape of your test vectors.
You can do that as following :
prediction_int = np.zeroes_like(prediction)
prediction_int[prediction > 0.5] = 1
这篇关于f得分:ValueError:分类指标无法处理多标签指标和连续多输出目标的混合情况的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!