np.concatenate具有稀疏矩阵的numpy数组 [英] `np.concatenate` a numpy array with a sparse matrix

查看:188
本文介绍了np.concatenate具有稀疏矩阵的numpy数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

数据集包含数值和类别变量,然后我将其分为两部分:

A dataset contains numerical and categorial variables, and I split then into two parts:

cont_data = data[cont_variables].values
disc_data = data[disc_variables].values

然后我使用sklearn.preprocessing.OneHotEncoder对分类数据进行编码,然后尝试将编码的分类数据与数值数据合并:

Then I use sklearn.preprocessing.OneHotEncoder to encode the categorical data, and then I tried to merge the coded categorical data with the numerical data:

np.concatenate((cont_data, disc_data_coded), axis=1)

但是发生以下错误:

ValueError: all the input arrays must have same number of dimensions

我确保尺寸数相等:

print(cont_data.shape)        # (24000, 35)
print(disc_data_coded.shape)  # (24000, 26)

最后,我发现cont_datanumpy array

>>> disc_data_coded
<24000x26 sparse matrix of type '<class 'numpy.float64'>'
with 312000 stored elements in Compressed Sparse Row format>

我将OneHotEncoder中的参数sparse更改为False,一切正常. 但是问题是,如何在不设置sparse=False的情况下直接将numpy arraysparse matrix合并?

I changed the parameter sparse in OneHotEncoderto be False, everything is OK. But the question is, how can I merge a numpy array with a sparse matrix directly, without setting sparse=False?

推荐答案

稀疏矩阵不是numpy数组的子类.因此numpy方法通常不起作用.请改用稀疏函数,例如sparse.vstacksparse.hstack.但是所有输入都必须是稀疏的.

Sparse matrices are not subclasses of numpy arrays; so numpy methods often don't work. Use sparse functions instead, such as sparse.vstack and sparse.hstack. But all inputs then have to be sparse.

或者先使用.toarray()使稀疏矩阵密集,然后使用np.concatenate.

Or make the sparse matrix dense first, with .toarray(), and use np.concatenate.

您想要结果稀疏还是密集?

Do you want the result to sparse or dense?

In [32]: sparse.vstack((sparse.csr_matrix(np.arange(10)),sparse.csr_matrix(np.on
    ...: es((3,10)))))
Out[32]: 
<4x10 sparse matrix of type '<class 'numpy.float64'>'
    with 39 stored elements in Compressed Sparse Row format>
In [33]: np.concatenate((sparse.csr_matrix(np.arange(10)).A,np.ones((3,10))))
Out[33]: 
array([[0., 1., 2., 3., 4., 5., 6., 7., 8., 9.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]])

这篇关于np.concatenate具有稀疏矩阵的numpy数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆