Scikit学习-ValueError:操作数不能一起广播 [英] Scikit Learn - ValueError: operands could not be broadcast together

查看:154
本文介绍了Scikit学习-ValueError:操作数不能一起广播的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在数据集上应用Gaussian Naive Bayes模型来预测疾病.当我预测使用训练数据时,它可以正常运行,但是当我尝试使用测试数据来预测时,它给出ValueError.

I'm trying to apply Gaussian Naive Bayes model on a dataset to predict disease. It's running correctly when I'm predicting using training data, but when I'm trying to predict using testing data It's giving ValueError.

runfile('D:/ROFI/ML/Heart Disease/prediction.py',wdir ='D:/ROFI/ML/Heart Disease') 追溯(最近一次通话):

runfile('D:/ROFI/ML/Heart Disease/prediction.py', wdir='D:/ROFI/ML/Heart Disease') Traceback (most recent call last):

文件",第1行,在 runfile('D:/ROFI/ML/Heart Disease/prediction.py',wdir ='D:/ROFI/ML/Heart Disease')

File "", line 1, in runfile('D:/ROFI/ML/Heart Disease/prediction.py', wdir='D:/ROFI/ML/Heart Disease')

文件"C:\ Users \ User \ Anaconda3 \ lib \ site-packages \ spyder \ utils \ site \ sitecustomize.py",行866,在运行文件中 execfile(文件名,命名空间)

File "C:\Users\User\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 866, in runfile execfile(filename, namespace)

exec文件中第102行的文件"C:\ Users \ User \ Anaconda3 \ lib \ site-packages \ spyder \ utils \ site \ sitecustomize.py" exec(compile(f.read(),文件名,'exec'),命名空间)

File "C:\Users\User\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile exec(compile(f.read(), filename, 'exec'), namespace)

文件"D:/ROFI/ML/Heart Disease/prediction.py",第85行,在 预测(x_train,y_train,x_test,y_test)

File "D:/ROFI/ML/Heart Disease/prediction.py", line 85, in predict(x_train, y_train, x_test, y_test)

预测中的文件"D:/ROFI/ML/Heart Disease/prediction.py",第73行 projection_data = model.predict(x_test)

File "D:/ROFI/ML/Heart Disease/prediction.py", line 73, in predict predicted_data = model.predict(x_test)

预测中的文件"C:\ Users \ User \ Anaconda3 \ lib \ site-packages \ sklearn \ naive_bayes.py",第65行 jll = self._joint_log_likelihood(X)

File "C:\Users\User\Anaconda3\lib\site-packages\sklearn\naive_bayes.py", line 65, in predict jll = self._joint_log_likelihood(X)

文件"C:\ Users \ User \ Anaconda3 \ lib \ site-packages \ sklearn \ naive_bayes.py",第429行,_joint_log_likelihood n_ij-= 0.5 * np.sum((((X-self.theta_ [i,:])** 2)/

File "C:\Users\User\Anaconda3\lib\site-packages\sklearn\naive_bayes.py", line 429, in _joint_log_likelihood n_ij -= 0.5 * np.sum(((X - self.theta_[i, :]) ** 2) /

ValueError:操作数不能与形状(294,14)(15,)一起广播

ValueError: operands could not be broadcast together with shapes (294,14) (15,)

这是怎么了?

import pandas
from sklearn import metrics
from sklearn.preprocessing import Imputer
from sklearn.naive_bayes import GaussianNB    

def load_data(feature_columns, predicted_column):

    train_data_frame = pandas.read_excel("training_data.xlsx")
    test_data_frame = pandas.read_excel("testing_data.xlsx")
    data_frame = pandas.read_excel("data_set.xlsx")

    x_train = train_data_frame[feature_columns].values
    y_train = train_data_frame[predicted_column].values

    x_test = test_data_frame[feature_columns].values
    y_test = test_data_frame[predicted_column].values

    x_train, x_test = impute(x_train, x_test)

    return x_train, y_train, x_test, y_test


def impute(x_train, x_test):

    fill_missing = Imputer(missing_values=-9, strategy="mean", axis=0)

    x_train = fill_missing.fit_transform(x_train)
    x_test = fill_missing.fit_transform(x_test)

    return x_train, x_test


def predict(x_train, y_train, x_test, y_test):

    model = GaussianNB()
    model.fit(x_train, y_train.ravel())

    predicted_data = model.predict(x_test)
    accuracy = metrics.accuracy_score(y_test, predicted_data)
    print("Accuracy of our naive bayes model is : %.2f"%(accuracy * 100))

    return predicted_data


feature_columns = ["age", "sex", "chol", "cigs", "years", "fbs", "trestbps", "restecg", "thalach", "exang", "oldpeak", "slope", "ca", "thal", "num"]
predicted_column = ["cp"]

x_train, y_train, x_test, y_test = load_data(feature_columns, predicted_column)

predict(x_train, y_train, x_test, y_test)

N.B:两个文件的列数相同.

N.B: Both file has same number of columns.

推荐答案

我发现了该错误.由于Imputer,导致发生错误. Imputer替换数据集中的缺失值.但是,如果任何列完全由缺失值组成,则它将删除该列.在测试数据集中,我有一列完全缺少数据的列.因此,Imputer正在删除它,因此形状与训练数据不匹配,这就是错误的原因.刚刚从feature_columns列表中删除了列名,该列名充满了缺失的值,并且起作用了.

I found the bug. The error is occurring because of Imputer. Imputer replaces the missing value in data set. But, if any column is entirely composed of missing value then it deletes that column. I had a column full of missing data entirely in testing data set. So, Imputer was deleting that and thus shape didn't match with training data and that's the reason of the error. Just removed the column name from feature_columns list which was full of missing value and it worked.

这篇关于Scikit学习-ValueError:操作数不能一起广播的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆