将Pandas数据框转换为数组并评估多元线性回归模型 [英] Turning a Pandas Dataframe to an array and evaluate Multiple Linear Regression Model
问题描述
我正在尝试评估多元线性回归模型.我有一个这样的数据集:
I am trying to evaluate a multiple linear regression model. I have a data set like this :
此数据集具有157行* 54列.
This data set has 157 rows * 54 columns.
我需要根据文章预测ground_truth的值.我将在 en_Amantadine 与 en_Common 之间添加我的多个线性模型7文章.
I need to predict ground_truth value from articles. I will add my multiple linear model 7 articles between en_Amantadine with en_Common.
我有用于多元线性回归的代码:
I have code for multiple linear regression :
from sklearn.linear_model import LinearRegression
X = [[6, 2], [8, 1], [10, 0], [14, 2], [18, 0]] // need to modify for my problem
y = [[7],[9],[13],[17.5], [18]] // need to modify
model = LinearRegression()
model.fit(X, y)
我的问题是,我无法从DataFrame中为 X 和 y 变量提取数据.在我的代码中X应该是:
My problem is, I cannot extract data from my DataFrame for X and y variables. In my code X should be:
X = [[4984, 94, 2837, 857, 356, 1678, 29901],
[4428, 101, 4245, 906, 477, 2313, 34176],
....
]
y = [[3.135999], [2.53356] ....]
我无法将DataFrame转换为这种类型的结构. 我怎样才能做到这一点 ?
I cannot convert DataFrame to this type of structure. How can i do this ?
感谢您的帮助.
推荐答案
您可以直接在数据框对象上使用方法as_matrix
将数据框转换为矩阵.您可能需要指定X=df[['x1','x2','X3']].as_matrix()
感兴趣的列,其中不同的x是列名.
You can turn the dataframe into a matrix using the method as_matrix
directly on the dataframe object. You might need to specify the columns which you are interested in X=df[['x1','x2','X3']].as_matrix()
where the different x's are the column names.
对于y变量,您可以使用y = df['ground_truth'].values
获取数组.
For the y variables you can use y = df['ground_truth'].values
to get an array.
以下是一些随机生成的数据的示例:
Here is an example with some randomly generated data:
import numpy as np
#create a 5X5 dataframe
df = pd.DataFrame(np.random.random_integers(0, 100, (5, 5)), columns = ['X1','X2','X3','X4','y'])
在df
上调用as_matrix()
会返回numpy.ndarray
对象
X = df[['X1','X2','X3','X4']].as_matrix()
调用values
会从熊猫series
y =df['y'].values
注意:您可能会收到一条警告:FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.
Notice: You might get a warning saying:FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.
要解决此问题,请使用values
而不是as_matrix
,如下所示
To fix it use values
instead of as_matrix
as shown below
X = df[['X1','X2','X3','X4']].values
这篇关于将Pandas数据框转换为数组并评估多元线性回归模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!