使用csr_matrix列表训练SGDClassifier [英] Use list of csr_matrix to train SGDClassifier

查看:72
本文介绍了使用csr_matrix列表训练SGDClassifier的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个列表X_train(> 20000个元素),每个元素都是由HashingVectorizer.transform()创建的稀疏密密的csr_matrix.

I've a list X_train (>20000 elements) with each element being a sparse scipy csr_matrix created by HashingVectorizer.transform().

我的HashingVectorizer.transform()对输入文件进行逐行转换,并将其附加到列表X_train中.

My HashingVectorizer.transform() does line by line transformation of the input file and appends it to the list X_train.

我正在尝试使用X_train训练SGDClassifier,但出现错误:

I'm trying to train a SGDClassifier using X_train but I get the error:

ValueError: setting an array element with a sequence.

如何在无需执行CPU或内存密集型操作的情况下训练SGDClassifier?

How can I train the SGDClassifier without having to do a CPU or memory intensive operation?

推荐答案

稀疏矩阵的列表,以及将其变成数组或稀疏矩阵(或不变成稀疏矩阵)的方式:

A list of sparse matrices, and ways of turning that into an array or sparse matrix (or not):

In [916]: alist=[sparse.random(1,10,.2, format='csr') for _ in range(3)]
In [917]: alist
Out[917]: 
[<1x10 sparse matrix of type '<class 'numpy.float64'>'
    with 2 stored elements in Compressed Sparse Row format>,
 <1x10 sparse matrix of type '<class 'numpy.float64'>'
    with 2 stored elements in Compressed Sparse Row format>,
 <1x10 sparse matrix of type '<class 'numpy.float64'>'
    with 2 stored elements in Compressed Sparse Row format>]

制作适当的稀疏矩阵(2d):

making a proper sparse matrix (2d):

In [918]: sparse.vstack(alist)
Out[918]: 
<3x10 sparse matrix of type '<class 'numpy.float64'>'
    with 6 stored elements in Compressed Sparse Row format>

矩阵的对象数组-错误

In [919]: np.array(alist)
Out[919]: 
array([ <1x10 sparse matrix of type '<class 'numpy.float64'>'
    with 2 stored elements in Compressed Sparse Row format>,
       <1x10 sparse matrix of type '<class 'numpy.float64'>'
    with 2 stored elements in Compressed Sparse Row format>,
       <1x10 sparse matrix of type '<class 'numpy.float64'>'
    with 2 stored elements in Compressed Sparse Row format>], dtype=object)

试图创建一个浮点数组-您的错误

Trying to make a float array - your error

In [920]: np.array(alist, float)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-920-52d4689fa7b3> in <module>()
----> 1 np.array(alist, float)

ValueError: setting an array element with a sequence.

这篇关于使用csr_matrix列表训练SGDClassifier的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆