如何将稀疏矩阵拆分为训练集和测试集? [英] How to split sparse matrix into train and test sets?
问题描述
我想了解如何处理稀疏矩阵.我有这段代码可以将多标签分类数据集生成为稀疏矩阵.
I want to understand how to work with sparse matrices. I have this code to generate multi-label classification data set as a sparse matrix.
from sklearn.datasets import make_multilabel_classification
X, y = make_multilabel_classification(sparse = True, n_labels = 20, return_indicator = 'sparse', allow_unlabeled = False)
此代码以以下格式为我提供 X:
This code gives me X in the following format:
<100x20 sparse matrix of type '<class 'numpy.float64'>'
with 1797 stored elements in Compressed Sparse Row format>
是:
<100x5 sparse matrix of type '<class 'numpy.int64'>'
with 471 stored elements in Compressed Sparse Row format>
现在我需要将 X 和 y 拆分为 X_train、X_test、y_train 和 y_test,以便训练集占 70%.我该怎么做?
Now I need to split X and y into X_train, X_test, y_train and y_test, so that train set consitutes 70%. How can I do it?
这是我试过的:
X_train, X_test, y_train, y_test = train_test_split(X.toarray(), y, stratify=y, test_size=0.3)
并收到错误消息:
TypeError:传递了稀疏矩阵,但需要密集数据.用X.toarray() 转换为密集的 numpy 数组.
TypeError: A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array.
推荐答案
错误消息本身似乎暗示了解决方案.需要将 X
和 y
都转换为密集矩阵.
The error message itself seems to suggest the solution. Need to convert both X
and y
to dense matrices.
请执行以下操作
X = X.toarray()
y = y.toarray()
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, test_size=0.3)
这篇关于如何将稀疏矩阵拆分为训练集和测试集?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!