python中计算最小范数解或从伪逆获得的解的最准确方法是什么? [英] What is the most accurate method in python for computing the minimum norm solution or the solution obtained from the pseudo-inverse?

查看:129
本文介绍了python中计算最小范数解或从伪逆获得的解的最准确方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的目标是解决:

Kc=y

带有伪逆数(即最小范数解):

c=K^{+}y

使得模型(希望)是高阶多项式模型f(x) = sum_i c_i x^i.我对这种不确定的情况特别感兴趣,在这种情况下,我们的多项式特征比数据更多(很少有方程式的变量/未知数太多)columns = deg+1 > N = rows.注意K是多项式特征的范德矩阵.

such that the model is (hopefully) high degree polynomial model f(x) = sum_i c_i x^i. I am specially interested in the underdetermined case where we have more polynomial features than data (few equation too many variables/unknowns) columns = deg+1 > N = rows. Note K is the vandermode matrix of polynomial features.

我最初使用的是python函数 np.linalg.pinv ,但随后我注意到此处出现了一些时髦的事情:

I was initially using the python function np.linalg.pinv but then I noticed something funky was going on as I noted here: Why do different methods for solving Xc=y in python give different solution when they should not?. In that question I use a square matrix to learn a function on the interval [-1.+1] with a high degree polynomial. The answer there suggested me to lower the degree of the polynomial and/or to increase the interval size. The main issue is that its not clear to me how to choose the interval or the max degree before thing become unreliable. I think my main issue is that choosing such a numerically stable range depends on the method that I may use. In the end what I truly care about is that

  1. 对于此多项式拟合问题,我使用的方法与伪逆完全(或非常接近)
  2. 其数值稳定
  1. the method I use is exactly (or really close) to the pseudo-inverse for this polynomial fitting problem
  2. that its numerically stable

理想情况下,我想尝试一个大多项式,但这可能会受到我的机器精度的限制.是否可以通过使用比浮点数更精确的值来提高机器的数值精度?

ideally I want to try a large degree polynomial but that might be limited by my machine precision. Is it possible to increase the numerical precision of the machine by using something more accurate than floats?

此外,我真的很在乎无论我使用python中的任何函数,它都能提供与 pseudo inverse 最为接近的答案(并希望其数值稳定,以便我可以用它).为了检查伪逆的答案,我编写了以下脚本:

Also, I really care that whatever function from python I use, it provides the closest answer to the pseudo inverse (and hopefully that its numerically stable so I can actually use it). To check which answer for the pseudo-inverse I wrote the following script:

import numpy as np
from sklearn.preprocessing import PolynomialFeatures

def l2_loss(y,y_):
    N = y.shape[0]
    return (1/N)*np.linalg.norm(y-y_)

## some parameters
lb,ub = -200,200
N=100
D0=1
degree_mdl = 120
## target function
freq_cos = 2
f_target = lambda x: np.cos(freq_cos*2*np.pi*x)
## evaluate target_f on x_points
X = np.linspace(lb,ub,N) # [N,]
Y = f_target(X) # [N,]
# get pinv solution
poly_feat = PolynomialFeatures(degree=degree_mdl)
Kern = poly_feat.fit_transform( X.reshape(N,D0) ) # low degrees first [1,x,x**2,...]
c_pinv = np.dot(np.linalg.pinv( Kern ), Y)
## get polyfit solution
c_polyfit = np.polyfit(X,Y,degree_mdl)[::-1] # need to reverse to get low degrees first [1,x,x**2,...]
##
c_lstsq,_,_,_ = np.linalg.lstsq(Kern,Y.reshape(N,1))
##
print('lb,ub = {} '.format((lb,ub)))
print('differences with c_pinv')
print( '||c_pinv-c_pinv||^2 = {}'.format( np.linalg.norm(c_pinv-c_pinv) ))
print( '||c_pinv-c_polyfit||^2 = {}'.format( np.linalg.norm(c_pinv-c_polyfit) ))
print( '||c_pinv-c_lstsq||^2 = {}'.format( np.linalg.norm(c_pinv-c_lstsq) ))
##
print('differences with c_polyfit')
print( '||c_polyfit-c_pinv||^2 = {}'.format( np.linalg.norm(c_polyfit-c_pinv) ))
print( '||c_polyfit-c_polyfit||^2 = {}'.format( np.linalg.norm(c_polyfit-c_polyfit) ))
print( '||c_polyfit-c_lstsq||^2 = {}'.format( np.linalg.norm(c_polyfit-c_lstsq) ))
##
print('differences with c_lstsq')
print( '||c_lstsq-c_pinv||^2 = {}'.format( np.linalg.norm(c_lstsq-c_pinv) ))
print( '||c_lstsq-c_polyfit||^2 = {}'.format( np.linalg.norm(c_lstsq-c_polyfit) ))
print( '||c_lstsq-c_lstsq||^2 = {}'.format( np.linalg.norm(c_lstsq-c_lstsq) ))
##
print('Data set errors')
y_polyfit = np.dot(Kern,c_polyfit)
print( 'J_data(c_polyfit) = {}'.format( l2_loss(y_polyfit,Y) ) )
y_pinv = np.dot(Kern,c_pinv)
print( 'J_data(c_pinv) = {}'.format( l2_loss(y_pinv,Y) ) )
y_lstsq = np.dot(Kern,c_lstsq)
print( 'J_data(c_lstsq) = {}'.format( l2_loss(y_lstsq,Y) ) )

使用

,我设法注意到polyfit很少与pinv使用的参数匹配.我知道pinv最终会返回伪逆,所以我想如果我的主要目标是确保我使用伪逆",那么使用np.pinv是一个好主意.但是,我也从数学上知道,伪逆总是将最小二乘误差J(c) = || Kc - y ||^2最小化,无论如何(证明

using that I managed to notice that rarely does polyfit ever matches the parameters that pinv uses. I know pinv definitively returns the pseudo-inverse, so I guess if my main goal is to "make sure I use the pseudo-inverse" then its a good idea to use np.pinv. However, I also know mathematically that the pseudo-inverse always minimizes the least squares error J(c) = || Kc - y ||^2 no matter what (proof here theorem 11.1.2 page 446). Thus, maybe my goal should be to just use the python function that return smallest least squares error J. Thus, I ran (in the underdetermined case) a comparison of the three methods

  1. Polygit,np.polyfit
  2. 伪逆,np.linalg.pinv
  3. 最小二乘,np.linalg.lstsq
  1. Polygit, np.polyfit
  2. pseudo-inverse, np.linalg.pinv
  3. least squares, np.linalg.lstsq

并比较了他们在数据上给我的最小二乘误差:

and compared what error least squares error they gave me on the data:

然后,我检查了函数似乎经历过的奇怪的倾角(顺便说一句,如果算法不是随机的,为什么会有倾角,这似乎是一个完全的谜),并且对于polyfit,数字通常较小,例如:

Then I inspected the weird dips the function seems to experience (which btw seems like a complete mystery why there are dips if the algorithms are not stochastic) and the numbers usually was smaller for polyfit, for example:

lb,ub = (-100, 100)
Data set errors
J_data(c_polyfit) = 5.329753025633029e-12
J_data(c_pinv) = 0.06670557822873546
J_data(c_lstsq) = 0.7479733306782645

给出这些结果,并且伪逆是最小二乘的最小化子,看来最好的办法是忽略np.pinv.那是最好的办法吗?还是我缺少明显的东西?

given these results and that pseudo-inverse is a minimizer of least squares, it seems that the best thing is to ignore np.pinv. Is that the best thing to do? Or am I missing something obvious?

作为补充,我进入了 polyfit代码来查看究竟是什么使它具有更好的最小平方误差(现在我正在使用它来表示其对伪逆的最佳逼近度),并且看起来有一些奇怪的条件/数值稳定性代码:

As an extra note I went into polyfit code to see what exactly is making it have better least square errors (which right now I am using as a way to say its the best approximation for the pseudo-inverse) and it seems it has some weird condition/numerical stability code:

# scale lhs to improve condition number and solve
scale = NX.sqrt((lhs*lhs).sum(axis=0))
lhs /= scale
c, resids, rank, s = lstsq(lhs, rhs, rcond)
c = (c.T/scale).T  # broadcast scale coefficients

我认为

是什么给pinv没有的polyfit带来了额外的稳定性?

which I assume, is what is bringing the extra stability to the polyfit that pinv does not have?

polyfit用于我的高次多项式线性系统逼近任务是正确的决定吗?

Is this the right decision to use polyfit for my task of high degree polynomials linear system approximation?

在这一点上,如果它为我提供了正确的伪逆和更多的数值稳定性(对于大多数度数和任何边界),我也愿意使用其他软件,例如matlab.

also at this point I'm willing to use other software like matlab if it provides me the correct pseudo-inverse AND more numerical stability (for most degrees and any bounds).

我刚刚有另一个随机的想法,那就是也许有一个很好的方法来对函数进行采样,以使伪逆的稳定性很好.我的猜测是,用多项式逼近余弦需要某种类型的样本数或它们之间的距离(例如,奈奎斯特·香农采样定理说基函数是否为正弦曲线……)

Another random idea I just had, was that maybe there is a nice way to sample the function, such that the stability of the pseudo-inverse is good. My guess is that approximating a cosine with a polynomial requires some type of number of samples or distance between them (like the nyquist shannon sampling theorem says if the basis functions are sinusoidals...)

应该指出,可能先反转(或伪ivnerting)然后相乘是一个坏主意.参见:

It should be pointed out that probably inverting (or pseudo ivnerting) and then multiplying is a bad idea. See:

https://www.johndcook.com /blog/2010/01/19/dont-invert-that-matrix/

人们只谈论逆,但我认为它也延伸到伪逆.

that one only talks about inverse but I assume it extends to pseudo-inverses as well.

现在我的困惑是,通常我们不想真正地显式地计算伪逆,而是执行A^+y=x_min_norm以获得最小范数解.但是,我本以为np.lstsq会产生我想要的答案,但它的错误也与其他错误大不相同.我发现这非常令人困惑...让我觉得我使用错误的方法来获取python中的最小范数解决方案.

right now my confusion is that usually we don't want to actually compute the pseudo-inverse explicitly and do A^+y=x_min_norm to get the minimum norm solution. However, I would have thought that np.lstsq would yield the answer that I wanted but its error differs greatly from the other too. I find that extremely confusing...make me think I'm using the wrong way to get the minimum norm solution in python.

我不试图获得正规化的解决方案.我正在尝试获得最小范数解,而没有其他方法,尽可能精确地计算出数字.

I am not trying to get a regularized solution. I am trying to get the minimum norm solution and nothing else, as numerically accurately as possible.

推荐答案

我的研究领域涉及一种基本上称为傅立叶扩展的压缩算法.什么是最准确的?由于平滑性,它高度依赖于我认为的矢量.在夏季,我使用了一种叫做 Savitsky Golay 的东西.有相当数值稳定和准确的方法对此进行过滤.但是,我的顾问所采用的方法相对较快且在数值上稳定.该区域称为傅里叶扩展或连续.如何?我不知道是否可以发布它,这是文章.如果我相信我可能已经在夏天在python上发布了部分内容.

My area of research involves a compression algorithm essentially called the Fourier extensions. What is the most accurate? It is highly dependent on the vector I believe due to properties of smoothness. During the summer I used something called Savitsky Golay. There are fairly numerically stable and accurate ways of filtering this. However, my adviser has a method that is relatively fast and numerically stable. The area is called Fourier extension or continuation. How? I don't know if I am allowed to post it, here is the article. If I believe I may have already posted in the summer on here in python partially.

这与python无关,因为python使用与大多数数字编码方案相同的基础库,即BLAS和LAPACK. Netlib 已在线.

This has nothing to do with python because python uses the same underlying libraries as most numerical coding schemes which is BLAS and LAPACK. Netlib is online.

我建议使用其他许多类似的快速且数字稳定的想法.有一整本书专门讨论b y Boyd.第6章和第7章将继续进行.这.这是由于我想象的信号中可能存在的潜在噪声而导致的带有正则化的总变化.

There are a number of other similar fast and numerically stable ideas that may be suitable I would recommend. There is an entire book devoted to this by Boyd. Chapters 6 and 7 are on this. It is about total variation with regularization due to the underlying noise you may have in the signal I imagine.

其他方面.由于状况不佳,您可能需要对SVD进行正规化处理.通常有专门的书籍.只需回答您的问题,什么是最佳算法.算法有多个维度,您尚未真正说明问题的性质.如果您不了解 Runge现象.高阶多项式是不利的.

Other aspects. You may need to regularize the SVD due to ill-conditioning. There are books devoted to this usually. Simply to answer your question what is best algorithm. There are multiple dimensions to algorithms and you haven't really stated the properties of the problem. If you didn't know about Runge's phenomenon. That is using high degree polynomials is adverse.

有一整类的Hermite多项式可以处理Gibbs现象和其他过滤技术,但这并不恰当.您正在使用通用函数.我建议买Trefthen和Bau.有时他们会进行切比雪夫投影.

There is a whole class of Hermite polynomials to deal with the Gibbs phenomena and other filtering techniques but this isn't posed well. You're using generic functions. I'd recommend getting Trefthen and Bau. Sometimes they do a Chebychev Reprojection.

K的条件数是多少.另外,当拟合多项式称为Runges现象时,会发生某些情况.您应该考虑一下.使用通用函数,如果条件数太大,则需要进行低秩近似以进行正则化.我实际上只是读过它.您正在使用Vandermonde矩阵.我将很容易地演示这一点.范德蒙德矩阵不好.不要使用它们. 他们有个结.

What is the condition number of K. Additionally there is something that happens when fitting polynomials called Runges phenomena. You should consider this. Use generic functions you need to do a low rank approximation to regularize if the condition number is too high. I actually just read it. You are using a Vandermonde matrix. I am going to demonstrate this pretty easily. Vandermonde matrices are bad. Don't use them. They have knots.

v = (1:.5:6);

V = vander(v);

c1 = cond(V)

v2 = (1:.5:12);
c2 = cond(vander(v2));
display(c1)
display(c2)

c1 =

6.0469e + 12

6.0469e+12

c2 =

9.3987e + 32

9.3987e+32

我尝试了低阶逼近,但vandermonde矩阵不好.看.

I attempted a low rank approximation but the vandermonde matrices are not nice. See.

function B = lowapprox(A)
% Takes a matrix A
% Returns a low rank approx of it
% utilizing the SVD
chi = 1e-10;
[U,S,V]  = svd(A,0);

DS = diag(S);
aa = find(DS > chi);
s= S(aa,aa);
k = length(aa);
Un = U(:,1:k);
Vn = V(:,1:k)';

B = Un*s*Vn;

end


V2  = vander(v2);
r2 = rank(V2);
c2=cond(V2);
B = lowapprox(V2);
c3 = cond(B);
display(c3)
c2 =

   9.3987e+32


c3 =

   3.7837e+32

并没有真正的作用...如果不知道得到的逆数是什么情况,那么条件数等于最大奇异值而不是最小奇异值,因此在机器精度下,您有一些非常小的奇异值.

does nothing really...If you don't know what is happening when you get that inverse the condition number is equal to maximum singular value over the minimum so you have some extremely small singular values at machine precision.

此外,我认为您对最低规范和正则化有些困惑.您说过您想要一个最小二乘的最小范数. SVD给出了最小二乘. 它的属性为9,A是由SVD根据基础构造的.这被三聚氰胺所覆盖,但范德蒙基质未适应.

Additionally, I think you have some confusion about minimum norm and regularization. You said you want a minimum norm in the least squares sense. The SVD gives the least squares. It's property nine, A is constructed from a basis by the SVD. This is covered in trefethen but the vandermonde matrix is ill-conditioned.

即使是构造不良的小型范德蒙矩阵也要丢失它.现在关于您的解决方案.不要使用vandermonde矩阵.否则构造多项式.更好的主意是重心拉格朗日插值.库是此处

even small ill constructed vandermonde matrices are going to lose it. Now about your about solution. Don't use vandermonde matrices. Construct the polynomial otherwise. A better idea is barycentric lagrange interpolation. A library is here

这是matlab中的一个示例.

Here is an example in matlab.

t= (0:.01:pi)';
f = cos(t);
data = [t,f];
f1 = barylag(data,t)
display(err =norm(f1-f1))
err =

     0

barylag取自matlab网站.由于我无法真正评论您的差异,因此您应该意识到lsqr的实际处理方式. Lsqr算法是Krylov方法.这在Trefethen中进行了介绍. SVD也是> 我在quora页面上有一个关于

barylag is taken from the matlab website. As I can't really comment on your discrepancies you should realize the actual way lsqr is done. Lsqr algorithms are Krylov methods. This is covered in Trefethen. SVD is also I have an example on my quora page about numerical stability with the QR which is how you actually construct these algorithms

这篇关于python中计算最小范数解或从伪逆获得的解的最准确方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆