拟合时出现简单模型错误:发现输入变量的样本数不一致 [英] Simple model error on fit: Found input variables with inconsistent numbers of samples

查看:593
本文介绍了拟合时出现简单模型错误:发现输入变量的样本数不一致的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道这个问题的存在形式多种多样,但是在网上搜索了几天/几小时后,我仍然找不到任何东西可以解决我的问题.

I know this question exists in various forms, but after searching the web for several days/hours, I still havent found anything, that solved my problem.

这是我的笔记本:

import numpy as np
import pandas as pd

X = pd.read_csv('../input/web-traffic-time-series-forecasting/train_1.csv.zip')
X = X.drop('Page', axis=1)
X.fillna(0, inplace=True, axis=0)

X_sliced = X.iloc[:, 0:367]
y_sliced = X.iloc[:, 367:-1]

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

linreg = LinearRegression()

X_sliced.drop(X_sliced.iloc[:, 182:367], inplace=True, axis=1) #Here, I make sure that my X_sliced has the same shape as y_sliced

X_sliced.shape

OUT:(145063,182)

OUT: (145063, 182)

y_sliced.shape

OUT:(145063,182)

OUT: (145063, 182)

X_train, y_train, X_test, y_test = train_test_split(X_sliced, y_sliced)
linreg.fit(X_train, y_train)

ValueError:找到输入样本数量不一致的输入变量:[108797,36266]

当数据框的形状完全相同时,为什么会收到此错误?

Why do I receive this error, when the shape of my dataframes are completely the same?

链接到kaggle上的原始作业: https://www.kaggle.com/c/web-traffic-time-series-forecasting/overview

Link to original assignment on kaggle: https://www.kaggle.com/c/web-traffic-time-series-forecasting/overview

推荐答案

您以错误的顺序分配了train_test_split的输出,它应该是:

You've assigned the outputs of train_test_split in the wrong order, it should be:

X_train, X_test, y_train, y_test = train_test_split(X_sliced, y_sliced) # x, x, y, y not x, y, x, y

这篇关于拟合时出现简单模型错误:发现输入变量的样本数不一致的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆