如何将稀疏矩阵拆分为训练集和测试集? [英] How to split sparse matrix into train and test sets?

查看:70
本文介绍了如何将稀疏矩阵拆分为训练集和测试集?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想了解如何处理稀疏矩阵.我有这段代码可以将多标签分类数据集生成为稀疏矩阵.

I want to understand how to work with sparse matrices. I have this code to generate multi-label classification data set as a sparse matrix.

from sklearn.datasets import make_multilabel_classification

X, y = make_multilabel_classification(sparse = True, n_labels = 20, return_indicator = 'sparse', allow_unlabeled = False)

此代码以以下格式为我提供 X:

This code gives me X in the following format:

<100x20 sparse matrix of type '<class 'numpy.float64'>' 
with 1797 stored elements in Compressed Sparse Row format>

是:

<100x5 sparse matrix of type '<class 'numpy.int64'>'
with 471 stored elements in Compressed Sparse Row format>

现在我需要将 X 和 y 拆分为 X_train、X_test、y_train 和 y_test,以便训练集占 70%.我该怎么做?

Now I need to split X and y into X_train, X_test, y_train and y_test, so that train set consitutes 70%. How can I do it?

这是我试过的:

X_train, X_test, y_train, y_test = train_test_split(X.toarray(), y, stratify=y, test_size=0.3)

并收到错误消息:

TypeError:传递了稀疏矩阵,但需要密集数据.用X.toarray() 转换为密集的 numpy 数组.

TypeError: A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array.

推荐答案

错误消息本身似乎暗示了解决方案.需要将 Xy 都转换为密集矩阵.

The error message itself seems to suggest the solution. Need to convert both X and y to dense matrices.

请执行以下操作

X = X.toarray()
y = y.toarray()

X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, test_size=0.3)

这篇关于如何将稀疏矩阵拆分为训练集和测试集?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆