python中稀疏矩阵的伪逆 [英] pseudo inverse of sparse matrix in python

查看:143
本文介绍了python中稀疏矩阵的伪逆的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理来自神经影像的数据,由于数据量很大,我想在我的代码(scipy.sparse.lil_matrix或csr_matrix)中使用稀疏矩阵.

I am working with data from neuroimaging and because of the large amount of data, I would like to use sparse matrices for my code (scipy.sparse.lil_matrix or csr_matrix).

尤其是,我将需要计算矩阵的伪逆来解决最小二乘问题. 我已经找到了sparse.lsqr方法,但是它不是很有效.有没有一种方法可以计算出Moore-Penrose的伪逆(对于正常矩阵,它对应于pinv).

In particular, I will need to compute the pseudo-inverse of my matrix to solve a least-square problem. I have found the method sparse.lsqr, but it is not very efficient. Is there a method to compute the pseudo-inverse of Moore-Penrose (correspondent to pinv for normal matrices).

我的矩阵A的大小约为600'000x2000,并且在矩阵的每一行中,我将具有0到4个非零值.矩阵A的大小由体素x纤维束(白质纤维束)给出,我们期望在体素中最多可以交叉4个束.在大多数白色物质体素中,我们期望至少有1个区域,但我会说大约20%的区域可能为零.

The size of my matrix A is about 600'000x2000 and in every row of the matrix I'll have from 0 up to 4 non zero values. The matrix A size is given by voxel x fiber bundle (white matter fiber tracts) and we are expecting maximum 4 tracts to cross in a voxel. In most of the white matter voxels we expect to have at least 1 tract, but I will say that around 20% of the lines could be zeros.

向量b不应稀疏,实际上b包含每个体素的度量,通常不为零.

The vector b should not be sparse, actually b contains the measure for each voxel, which is in general not zero.

我需要使误差最小化,但是向量x上也有一些条件.当我在较小的矩阵上尝试模型时,我不需要约束系统即可满足这些条件(通常为0

I would need to minimize the error, but there are also some conditions on the vector x. As I tried the model on smaller matrices, I never needed to constrain the system in order to satisfy these conditions (in general 0

有什么帮助吗?有办法避免采用A的伪逆吗?

Is that of any help? Is there a way to avoid taking the pseudo-inverse of A?

谢谢

6月1日更新: 再次感谢您的帮助. 我真的无法向您展示任何有关我的数据的信息,因为python中的代码给我带来了一些问题.但是,为了理解如何选择一个好的k,我尝试在Matlab中创建一个测试函数.

Update 1st June: thanks again for the help. I can't really show you anything about my data, because the code in python give me some problems. However, in order to understand how I could choose a good k I've tried to create a testing function in Matlab.

代码如下:

F=zeros(100000,1000);

for k=1:150000
    p=rand(1);
    a=0;
    b=0;
    while a<=0 || b<=0
    a=random('Binomial',100000,p);
    b=random('Binomial',1000,p);
    end
    F(a,b)=rand(1);
end

solution=repmat([0.5,0.5,0.8,0.7,0.9,0.4,0.7,0.7,0.9,0.6],1,100);
size(solution)
solution=solution';
measure=F*solution;
%check=pinvF*measure;
k=250;
F=sparse(F);
[U,S,V]=svds(F,k);
s=svds(F,k);
plot(s)
max(max(U*S*V'-F))
for s=1:k
    if S(s,s)~=0
        S(s,s)=1/S(s,s);
    end
end

inv=V*S'*U';
inv*measure
max(inv*measure-solution)

您是否知道与F的大小相比k应该是多少?我已经选了250个(超过1000个),结果并不令人满意(等待时间可以接受,但不短). 现在我也可以将结果与已知的解决方案进行比较,但是一般来说,如何选择k? 我还附上了我得到的250个单个值的图,并将它们的平方归一化.我不确切地知道如何更好地在matlab中做一个screeplot.我现在正使用较大的k来查看该值是否突然变小.

Do you have any idea of what should be k compare to the size of F? I've taken 250 (over 1000) and the results are not satisfactory (the waiting time is acceptable, but not short). Also now I can compare the results with the known solution, but how could one choose k in general? I also attached the plot of the 250 single values that I get and their squares normalized. I don't know exactly how to better do a screeplot in matlab. I'm now proceeding with bigger k to see if suddently the value will be much smaller.

再次感谢, 珍妮佛

推荐答案

您可以详细研究无论如何,请注意,稀疏矩阵的伪逆极有可能是(非常)稠密的矩阵,因此,在求解稀疏线性系统时,遵循(通常)并不是一个有效的途径.

Anyway, please note that a pseudo-inverse of a sparse matrix is most likely to be a (very) dense one, so it's not really a fruitful avenue (in general) to follow, when solving sparse linear systems.

您可能希望以更详细的方式描述您的特定问题(dot(A, x)= b+ e).至少指定:

You may like to describe a slight more detailed manner your particular problem (dot(A, x)= b+ e). At least specify:

  • 典型"尺寸为A
  • A 中非零条目的
  • 典型"百分比
  • 最小二乘法表示将norm(e)最小化,但是请指出您的主要兴趣是在x_hat还是b_hat上,其中e= b- b_hatb_hat= dot(A, x_hat)
  • 'typical' size of A
  • 'typical' percentage of nonzero entries in A
  • least-squares implies that norm(e) is minimized, but please indicate whether your main interest is on x_hat or on b_hat, where e= b- b_hat and b_hat= dot(A, x_hat)

更新:如果您对A的排名有所了解(并且排名远小于列数),则可以尝试

Update: If you have some idea of the rank of A (and its much smaller than number of columns), you could try total least squares method. Here is a simple implementation, where k is the number of first singular values and vectors to use (i.e. 'effective' rank).

from scipy.sparse import hstack
from scipy.sparse.linalg import svds

def tls(A, b, k= 6):
    """A tls solution of Ax= b, for sparse A."""
    u, s, v= svds(hstack([A, b]), k)
    return v[-1, :-1]/ -v[-1, -1]

这篇关于python中稀疏矩阵的伪逆的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆