使用train_test_split()中的值列表作为训练数据 [英] Using a list of values from train_test_split() as training data
本文介绍了使用train_test_split()中的值列表作为训练数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试对某些数据进行线性回归.数据就是这样.
I am trying to run a linear regression on some data. This is what the data looks like.
X = df ['vectors']
看起来像这样:
0 [-1.86135, 1.3202, 0.023501, -2.9511, 1.62135,...
1 [0.5487195, 0.27389452, 0.49712706, 0.6853927,...
2 [-1.3525691, -0.8444542, 2.8269022, -1.4456564...
3 [1.0730275, -0.14970247, -1.1424525, -1.953272...
4 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
当我在其上运行线性回归模型时:
When I run the linear regression model on it:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=101)
lm = LinearRegression()
lm.fit(X_train, y_train)
我收到此错误:
TypeError Traceback (most recent call last)
TypeError: only size-1 arrays can be converted to Python scalars
The above exception was the direct cause of the following exception:
如何将X中的值转换为标量?我当时想获取向量的平均值,但不确定确切的计算方法.
How can I turn values in X into a scalar? I was thinking of getting the average of the vectors but not sure exactly how to go about it.
推荐答案
从外观上看, X
是 pandas.Series
对象.
由于 X
的每一行中的所有列表都具有相同的长度,因此您可以将 X
整形为具有与 X
,并且每个列表中的列数与列数相同.
Since all the lists inside each row of X
are the same length, you can reshape X
into an ndarray with the same number of rows as X
and as many columns as there are elements in each list.
# Import numpy
import numpy as np
# Reshape
X = np.array(X.explode()).reshape(len(X), -1)
# Do the same as before
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=101)
lm = LinearRegression()
lm.fit(X_train, y_train)
这篇关于使用train_test_split()中的值列表作为训练数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文