numpy / scipy:相关性 [英] numpy/scipy: correlation

查看:126
本文介绍了numpy / scipy:相关性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否有numpy / scipy中的现成函数来快速计算X和Y的相关性y = mx + o:

m,m-err,o,o-err,r- coef,r-coef-err?


或计算3个误差范围的公式?


-robert
< b / b
numpy.corrcoef只计算刚系数:


>> numpy.corrcoef((0,1,2,3.0),(2,5,6,7.0),)



array([[1.,0.95618289],

[0.95618289,1。]])


with int计算错误:


>> numpy.corrcoef( (0,1,2,3),(2,5,6,7),)



array([[1 。,0.94491118],

[0.94491118,1。]])

解决方案

robert写道:


是否有numpy / scipy中的现成函数来计算相关性y = mx + o一个X和Y快:

m,m-err,o,o-err,r-coef,r-coef-err?



numpy和scipy问题最好在他们的名单上询问,而不是在这里。有一个

的人知道numpy和scipy的功能和

通过,但大多数人都没有在comp.lang.python上闲逛。

http://www.scipy.org/Mailing_Lists


scipy.optimize.leastsq()可以被告知返回

估计参数的协方差矩阵(在你的例子中为m和o;我不知道你的想法

r-coeff is)。


-

Robert Kern


我开始相信整个世界都是一个谜,一个无害的谜团,因为我们疯狂地试图解释它,好像它已经变得糟透了/>
一个潜在的事实。

- Umberto Eco


Robert Kern写道:


robert写道:


>有没有准备好了使得numpy / scipy函数能够快速计算X和Y的相关性y = mx + o:
m,m-err,o,o-err,r-coef,r-coef-err?


scipy.optimize.leastsq()可以被告知返回

估计的协方差矩阵参数(你的例子中的m和o;我不知道你的想法

r-coeff是)。



啊,相关系数本身。由于相关系数很奇怪,受限于[-1,1]的b $ b野兽,像你这样的标准高斯错误预计为m-err和o-err不要
应用。不,目前在numpy或

scipy中没有功能可以做一些足够复杂的可靠性。这是一个选项:

http://www.pubmedcentral.nih.gov/art...i?artid=155684


-

Robert Kern


我开始相信整个世界都是一个谜,一个无害的谜团

由于我们疯狂地试图解释它而变得可怕好像它有一个潜在的真相。

- Umberto Eco


robert写道:


是否有numpy / scipy中的现成函数来计算X和Y的相关性y = mx + o:

m,m-err,o,o-err,r-coef,r-coef-err?



当然,这三个参数并不是特别有意义。

如果你的模型真的是y是给出的线性响应x具有正常噪声然后

" y = m * x + o"是正确的,您可以从数据

中获得的所有信息都可以在m和o的估计值以及

估计值的协方差矩阵中找到。另一方面,如果你的模型是(x,y)以双变量分配

正态分布,那么那么y = m * x + o这个模型不是特别好的代表。

。您应该估计

(x,y)的平均向量和协方差矩阵。

除以边际标准偏差之后,你的相关系数将是非对角线项。


这两个模型的区别在于第一个没有限制

分配x。第二个做; x和y边际

分布都需要正常。在第一个模型下,相关性

系数没有意义。


-

Robert Kern


我开始相信整个世界都是一个谜,一个无害的谜团,因为我们疯狂地试图解释它,好像它已经变得糟透了/>
一个潜在的事实。

- Umberto Eco


Is there a ready made function in numpy/scipy to compute the correlation y=mx+o of an X and Y fast:
m, m-err, o, o-err, r-coef,r-coef-err ?

Or a formula to to compute the 3 error ranges?

-robert

PS:

numpy.corrcoef computes only the bare coeff:

>>numpy.corrcoef((0,1,2,3.0),(2,5,6,7.0),)

array([[ 1. , 0.95618289],
[ 0.95618289, 1. ]])

with ints it goes computes wrong:

>>numpy.corrcoef((0,1,2,3),(2,5,6,7),)

array([[ 1. , 0.94491118],
[ 0.94491118, 1. ]])

解决方案

robert wrote:

Is there a ready made function in numpy/scipy to compute the correlation y=mx+o of an X and Y fast:
m, m-err, o, o-err, r-coef,r-coef-err ?

numpy and scipy questions are best asked on their lists, not here. There are a
number of people who know the capabilities of numpy and scipy through and
through, but most of them don''t hang out on comp.lang.python.

http://www.scipy.org/Mailing_Lists

scipy.optimize.leastsq() can be told to return the covariance matrix of the
estimated parameters (m and o in your example; I have no idea what you think
r-coeff is).

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco


Robert Kern wrote:

robert wrote:

>Is there a ready made function in numpy/scipy to compute the correlation y=mx+o of an X and Y fast:
m, m-err, o, o-err, r-coef,r-coef-err ?

scipy.optimize.leastsq() can be told to return the covariance matrix of the
estimated parameters (m and o in your example; I have no idea what you think
r-coeff is).

Ah, the correlation coefficient itself. Since correlation coefficients are weird
beasts constrained to [-1, 1], standard gaussian errors like you are expecting
for m-err and o-err don''t apply. No, there''s currently no function in numpy or
scipy that will do something sophisticated enough to be reliable. Here''s an option:

http://www.pubmedcentral.nih.gov/art...i?artid=155684

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco


robert wrote:

Is there a ready made function in numpy/scipy to compute the correlation y=mx+o of an X and Y fast:
m, m-err, o, o-err, r-coef,r-coef-err ?

And of course, those three parameters are not particularly meaningful together.
If your model is truly "y is a linear response given x with normal noise" then
"y=m*x+o" is correct, and all of the information that you can get from the data
will be found in the estimates of m and o and the covariance matrix of the
estimates.

On the other hand, if your model is that "(x, y) is distributed as a bivariate
normal distribution" then "y=m*x+o" is not a particularly good representation of
the model. You should instead estimate the mean vector and covariance matrix of
(x, y). Your correlation coefficient will be the off-diagonal term after
dividing out the marginal standard deviations.

The difference between the two models is that the first places no restrictions
on the distribution of x. The second does; both the x and y marginal
distributions need to be normal. Under the first model, the correlation
coefficient has no meaning.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco


这篇关于numpy / scipy:相关性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆