使用train_test_split()中的值列表作为训练数据 [英] Using a list of values from train_test_split() as training data

查看:59
本文介绍了使用train_test_split()中的值列表作为训练数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试对某些数据进行线性回归.数据就是这样.

I am trying to run a linear regression on some data. This is what the data looks like.

X = df ['vectors'] 看起来像这样:

0      [-1.86135, 1.3202, 0.023501, -2.9511, 1.62135,...
1      [0.5487195, 0.27389452, 0.49712706, 0.6853927,...
2      [-1.3525691, -0.8444542, 2.8269022, -1.4456564...
3      [1.0730275, -0.14970247, -1.1424525, -1.953272...
4      [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...

当我在其上运行线性回归模型时:

When I run the linear regression model on it:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=101)
lm = LinearRegression()
lm.fit(X_train, y_train)

我收到此错误:

TypeError                                 Traceback (most recent call last)
TypeError: only size-1 arrays can be converted to Python scalars

The above exception was the direct cause of the following exception:

如何将X中的值转换为标量?我当时想获取向量的平均值,但不确定确切的计算方法.

How can I turn values in X into a scalar? I was thinking of getting the average of the vectors but not sure exactly how to go about it.

推荐答案

从外观上看, X pandas.Series 对象.

由于 X 的每一行中的所有列表都具有相同的长度,因此您可以将 X 整形为具有与 X ,并且每个列表中的列数与列数相同.

Since all the lists inside each row of X are the same length, you can reshape X into an ndarray with the same number of rows as X and as many columns as there are elements in each list.

# Import numpy
import numpy as np

# Reshape
X = np.array(X.explode()).reshape(len(X), -1)

# Do the same as before
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=101)
lm = LinearRegression()
lm.fit(X_train, y_train)

这篇关于使用train_test_split()中的值列表作为训练数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆