稀疏最小二乘回归 [英] sparse least square regression

查看:336
本文介绍了稀疏最小二乘回归的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试拟合线性回归Ax = b,其中A是稀疏矩阵,而b是稀疏向量.我尝试了scipy.sparse.linalg.lsqr,但是显然b需要是一个numpy(密集)数组.的确,如果我跑步

I am trying to fit a linear regression Ax = b where A is a sparse matrix and b a sparse vector. I tried scipy.sparse.linalg.lsqr but apparently b needs to be a numpy (dense) array. Indeed if i run

A = [list(range(0,10)) for i in range(0,15)]
A = scipy.sparse.coo_matrix(A)
b = list(range(0,15))
b = scipy.sparse.coo_matrix(b)
scipy.sparse.linalg.lsqr(A,b)

我最终得到:

AttributeError:找不到挤压

AttributeError: squeeze not found

scipy.sparse.linalg.lsqr(A,b.toarray())

似乎可以正常工作.

不幸的是,在我的情况下,b是一个15亿x 1的向量,我根本不能使用密集数组.有人知道使用稀疏矩阵和向量进行线性回归的解决方法或其他库吗?

Unfortunately, in my case b is a 1,5 billion x 1 vector and I simply can't use a dense array. Does anybody know a workaround or other libraries for running linear regression with sparse matrix and vector?

推荐答案

似乎文档专门要求使用numpy数组.但是,考虑到问题的严重性,也许更容易使用线性最小二乘的封闭式解决方案?

It seems that the documentation specifically asks for numpy array. However, given the scale of your problem, maybe its easier to use the closed-form solution of Linear Least Squares?

鉴于您要求解Ax = b,则可以强制转换普通方程式,然后改为求解这些方程式.换句话说,您将解决min ||Ax-b||.

Given that you want to solve Ax = b, you can cast the normal equations and solve those instead. In other words, you'd solve min ||Ax-b||.

封闭形式的解决方案是x = (A.T*A)^{-1} * A.T *b. 当然,这种封闭形式的解决方案有其自身的要求(特别是在矩阵A的等级上).

The closed form solution would be x = (A.T*A)^{-1} * A.T *b. Of course, this closed form solution comes with its own requirements (specifically, on the rank of the matrix A).

您可以使用spsolve求解x,或者如果它太昂贵,则可以使用迭代求解器(例如共轭梯度)来获得不精确的解决方案.

You can solve for x using spsolve or if that's too expensive, then using an iterative solver (like Conjugate Gradients) to get an inexact solution.

代码为:

A = scipy.sparse.rand(1500,1000,0.5) #Create a random instance
b = scipy.sparse.rand(1500,1,0.5)
x = scipy.sparse.linalg.spsolve(A.T*A,A.T*b)
x_lsqr = scipy.sparse.linalg.lsqr(A,b.toarray()) #Just for comparison
print scipy.linalg.norm(x_lsqr[0]-x)

在一些随机实例中,持续给我的值小于1E-7.

which on a few random instances, consistently gave me values less than 1E-7.

这篇关于稀疏最小二乘回归的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆