AxisError:计算 AUC 时,轴 1 超出维度 1 数组的范围 [英] AxisError: axis 1 is out of bounds for array of dimension 1 when calculating AUC

查看:246
本文介绍了AxisError:计算 AUC 时,轴 1 超出维度 1 数组的范围的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个分类问题,我有一个 8x8 图像的像素值和图像代表的数字,我的任务是使用 RandomForestClassifier 根据像素值预测数字('Number' 属性).数值的取值范围为0-9.

from sklearn.ensemble import RandomForestClassifier从 sklearn.metrics 导入 roc_auc_scoreForest_model = RandomForestClassifier(n_estimators=100,random_state=42)Forest_model.fit(train_df[input_var], train_df[target])test_df['forest_pred'] = forest_model.predict_proba(test_df[input_var])[:,1]roc_auc_score(test_df['Number'], test_df['forest_pred'],average = 'macro', multi_class="ovr")

这里抛出一个AxisError.

<前>回溯(最近一次调用最后一次):文件dap_hazi_4.py",第 44 行,在roc_auc_score(test_df['Number'],test_df['forest_pred'],average = 'macro', multi_class="ovo")文件/home/balint/.local/lib/python3.6/site-packages/sklearn/metrics/_ranking.py",第 383 行,在 roc_auc_score多类、平均值、样本权重)文件/home/balint/.local/lib/python3.6/site-packages/sklearn/metrics/_ranking.py",第440行,_multiclass_roc_auc_score如果不是 np.allclose(1, y_score.sum(axis=1)):文件/home/balint/.local/lib/python3.6/site-packages/numpy/core/_methods.py",第 38 行,在 _sum 中返回 umr_sum(a,axis, dtype, out, keepdims, initial, where)AxisError:轴 1 超出维度 1 数组的范围

解决方案

实际上,由于您的问题是多类问题,因此标签必须是单热编码的.当标签是单热编码时,multi_class"参数起作用.通过提供单热编码标签,您可以解决错误.

假设,您有 100 个具有 5 个唯一类别的测试标签,那么您的矩阵大小(测试标签的)必须是 (100,5) NOT (100,1)

I have a classification problem where I have the pixels values of an 8x8 image and the number the image represents and my task is to predict the number('Number' attribute) based on the pixel values using RandomForestClassifier. The values of the number values can be 0-9.

from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import roc_auc_score

forest_model = RandomForestClassifier(n_estimators=100, random_state=42)
forest_model.fit(train_df[input_var], train_df[target])
test_df['forest_pred'] = forest_model.predict_proba(test_df[input_var])[:,1]
roc_auc_score(test_df['Number'], test_df['forest_pred'], average = 'macro', multi_class="ovr")

Here it throws an AxisError.

Traceback (most recent call last):
  File "dap_hazi_4.py", line 44, in 
    roc_auc_score(test_df['Number'], test_df['forest_pred'], average = 'macro', multi_class="ovo")
  File "/home/balint/.local/lib/python3.6/site-packages/sklearn/metrics/_ranking.py", line 383, in roc_auc_score
    multi_class, average, sample_weight)
  File "/home/balint/.local/lib/python3.6/site-packages/sklearn/metrics/_ranking.py", line 440, in _multiclass_roc_auc_score
    if not np.allclose(1, y_score.sum(axis=1)):
  File "/home/balint/.local/lib/python3.6/site-packages/numpy/core/_methods.py", line 38, in _sum
    return umr_sum(a, axis, dtype, out, keepdims, initial, where)

AxisError: axis 1 is out of bounds for array of dimension 1

解决方案

Actually, as your problem is multi-class the labels must be one-hot encoded. When labels are one-hot encoded then the 'multi_class' arguments work. By providing one-hot encoded labels you can resolve the error.

Suppose, you have 100 test labels with 5 unique classes then your matrix size(test label's) must be (100,5) NOT (100,1)

这篇关于AxisError:计算 AUC 时,轴 1 超出维度 1 数组的范围的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆