使用scikit-learn(sklearn)，如何处理缺失数据以进行线性回归? [英] Using scikit-learn (sklearn), how to handle missing data for linear regression?

查看：68 发布时间：2021/5/29 20:59:22 python pandas machine-learning scikit-learn linear-regression

本文介绍了使用scikit-learn(sklearn)，如何处理缺失数据以进行线性回归?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我尝试了此操作，但无法使其用于我的数据:使用Scikit了解在时间序列熊猫数据帧上进行线性回归

I tried this but couldn't get it to work for my data: Use Scikit Learn to do linear regression on a time series pandas data frame

我的数据包含2个数据框. DataFrame_1.shape =(40,5000)和 DataFrame_2.shape =(40,74).我正在尝试进行某种类型的线性回归，但是 DataFrame_2 包含 NaN 缺失的数据值.当我 DataFrame_2.dropna(how="any") 时，形状下降到 (2,74).

My data consists of 2 DataFrames. DataFrame_1.shape = (40,5000) and DataFrame_2.shape = (40,74). I'm trying to do some type of linear regression, but DataFrame_2 contains NaN missing data values. When I DataFrame_2.dropna(how="any") the shape drops to (2,74).

sklearn中是否有任何线性回归算法可以处理 NaN 值?

Is there any linear regression algorithm in sklearn that can handle NaN values?

我在 sklearn.datasets 的 load_boston 之后建模，其中 X，y = boston.data，boston.target =(506,13)，(506，)

I'm modeling it after the load_boston from sklearn.datasets where X,y = boston.data, boston.target = (506,13),(506,)

这是我的简化代码:

X = DataFrame_1
for col in DataFrame_2.columns:
    y = DataFrame_2[col]
    model = LinearRegression()
    model.fit(X,y)

#ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

我执行了上述格式，以使形状与矩阵相匹配

I did the above format to get the shapes to match up of the matrices

如果发布 DataFrame_2 有帮助，请在下面发表评论，我将其添加.

If posting the DataFrame_2 would help, please comment below and I'll add it.

推荐答案

您可以使用插补在 y 中填充空值.在 scikit-learn 中，这是通过以下代码段完成的:

You can fill in the null values in y with imputation. In scikit-learn this is done with the following code snippet:

from sklearn.preprocessing import Imputer
imputer = Imputer()
y_imputed = imputer.fit_transform(y)

否则，您可能希望使用74列的子集作为预测变量来构建模型，也许您的某些列包含的空值较少?

Otherwise, you might want to build your model using a subset of the 74 columns as predictors, perhaps some of your columns contain less null values?

这篇关于使用scikit-learn(sklearn)，如何处理缺失数据以进行线性回归?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用scikit-learn(sklearn)，如何处理缺失数据以进行线性回归? [英] Using scikit-learn (sklearn), how to handle missing data for linear regression?

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

使用scikit-learn(sklearn)，如何处理缺失数据以进行线性回归? [英] Using scikit-learn (sklearn), how to handle missing data for linear regression?

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭