稀疏最小二乘回归 [英] sparse least square regression
问题描述
我正在尝试拟合线性回归Ax = b
,其中A
是稀疏矩阵,而b
是稀疏向量.我尝试了scipy.sparse.linalg.lsqr
,但是显然b
需要是一个numpy(密集)数组.的确,如果我跑步
I am trying to fit a linear regression Ax = b
where A
is a sparse matrix and b
a sparse vector. I tried scipy.sparse.linalg.lsqr
but apparently b
needs to be a numpy (dense) array. Indeed if i run
A = [list(range(0,10)) for i in range(0,15)]
A = scipy.sparse.coo_matrix(A)
b = list(range(0,15))
b = scipy.sparse.coo_matrix(b)
scipy.sparse.linalg.lsqr(A,b)
我最终得到:
AttributeError:找不到挤压
AttributeError: squeeze not found
而
scipy.sparse.linalg.lsqr(A,b.toarray())
似乎可以正常工作.
不幸的是,在我的情况下,b是一个15亿x 1的向量,我根本不能使用密集数组.有人知道使用稀疏矩阵和向量进行线性回归的解决方法或其他库吗?
Unfortunately, in my case b is a 1,5 billion x 1 vector and I simply can't use a dense array. Does anybody know a workaround or other libraries for running linear regression with sparse matrix and vector?
推荐答案
似乎文档专门要求使用numpy
数组.但是,考虑到问题的严重性,也许更容易使用线性最小二乘的封闭式解决方案?
It seems that the documentation specifically asks for numpy
array. However, given the scale of your problem, maybe its easier to use the closed-form solution of Linear Least Squares?
鉴于您要求解Ax = b,则可以强制转换普通方程式,然后改为求解这些方程式.换句话说,您将解决min ||Ax-b||
.
Given that you want to solve Ax = b, you can cast the normal equations and solve those instead. In other words, you'd solve min ||Ax-b||
.
封闭形式的解决方案是x = (A.T*A)^{-1} * A.T *b
.
当然,这种封闭形式的解决方案有其自身的要求(特别是在矩阵A的等级上).
The closed form solution would be x = (A.T*A)^{-1} * A.T *b
.
Of course, this closed form solution comes with its own requirements (specifically, on the rank of the matrix A).
您可以使用spsolve
求解x
,或者如果它太昂贵,则可以使用迭代求解器(例如共轭梯度)来获得不精确的解决方案.
You can solve for x
using spsolve
or if that's too expensive, then using an iterative solver (like Conjugate Gradients) to get an inexact solution.
代码为:
A = scipy.sparse.rand(1500,1000,0.5) #Create a random instance
b = scipy.sparse.rand(1500,1,0.5)
x = scipy.sparse.linalg.spsolve(A.T*A,A.T*b)
x_lsqr = scipy.sparse.linalg.lsqr(A,b.toarray()) #Just for comparison
print scipy.linalg.norm(x_lsqr[0]-x)
在一些随机实例中,持续给我的值小于1E-7
.
which on a few random instances, consistently gave me values less than 1E-7
.
这篇关于稀疏最小二乘回归的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!