将Pandas数据框转换为数组并评估多元线性回归模型 [英] Turning a Pandas Dataframe to an array and evaluate Multiple Linear Regression Model

查看:118
本文介绍了将Pandas数据框转换为数组并评估多元线性回归模型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试评估多元线性回归模型.我有一个这样的数据集:

I am trying to evaluate a multiple linear regression model. I have a data set like this :

此数据集具有157行* 54列.

This data set has 157 rows * 54 columns.

我需要根据文章预测ground_truth的值.我将在 en_Amantadine en_Common 之间添加我的多个线性模型7文章.

I need to predict ground_truth value from articles. I will add my multiple linear model 7 articles between en_Amantadine with en_Common.

我有用于多元线性回归的代码:

I have code for multiple linear regression :

from sklearn.linear_model import LinearRegression
X = [[6, 2], [8, 1], [10, 0], [14, 2], [18, 0]] // need to modify for my problem
y = [[7],[9],[13],[17.5], [18]] // need to modify
model = LinearRegression()
model.fit(X, y)

我的问题是,我无法从DataFrame中为 X y 变量提取数据.在我的代码中X应该是:

My problem is, I cannot extract data from my DataFrame for X and y variables. In my code X should be:

X = [[4984, 94, 2837, 857, 356, 1678, 29901],
     [4428, 101, 4245, 906, 477, 2313, 34176],
      ....
     ]
y = [[3.135999], [2.53356] ....]

我无法将DataFrame转换为这种类型的结构. 我怎样才能做到这一点 ?

I cannot convert DataFrame to this type of structure. How can i do this ?

感谢您的帮助.

推荐答案

您可以直接在数据框对象上使用方法as_matrix将数据框转换为矩阵.您可能需要指定X=df[['x1','x2','X3']].as_matrix()感兴趣的列,其中不同的x是列名.

You can turn the dataframe into a matrix using the method as_matrix directly on the dataframe object. You might need to specify the columns which you are interested in X=df[['x1','x2','X3']].as_matrix() where the different x's are the column names.

对于y变量,您可以使用y = df['ground_truth'].values获取数组.

For the y variables you can use y = df['ground_truth'].values to get an array.

以下是一些随机生成的数据的示例:

Here is an example with some randomly generated data:

import numpy as np
#create a 5X5 dataframe
df = pd.DataFrame(np.random.random_integers(0, 100, (5, 5)), columns = ['X1','X2','X3','X4','y'])

df上调用as_matrix()会返回numpy.ndarray对象

X = df[['X1','X2','X3','X4']].as_matrix()

调用values会从熊猫series

y =df['y'].values

注意:您可能会收到一条警告:FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.

Notice: You might get a warning saying:FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.

要解决此问题,请使用values而不是as_matrix,如下所示

To fix it use values instead of as_matrix as shown below

X = df[['X1','X2','X3','X4']].values

这篇关于将Pandas数据框转换为数组并评估多元线性回归模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆