numpy / scipy:相关性 [英] numpy/scipy: correlation
问题描述
是否有numpy / scipy中的现成函数来快速计算X和Y的相关性y = mx + o:
m,m-err,o,o-err,r- coef,r-coef-err?
或计算3个误差范围的公式?
-robert
< b / b
numpy.corrcoef只计算刚系数:
>> numpy.corrcoef((0,1,2,3.0),(2,5,6,7.0),)
array([[1.,0.95618289],
[0.95618289,1。]])
with int计算错误:
>> numpy.corrcoef( (0,1,2,3),(2,5,6,7),)
array([[1 。,0.94491118],
[0.94491118,1。]])
robert写道:
是否有numpy / scipy中的现成函数来计算相关性y = mx + o一个X和Y快:
m,m-err,o,o-err,r-coef,r-coef-err?
numpy和scipy问题最好在他们的名单上询问,而不是在这里。有一个
的人知道numpy和scipy的功能和
通过,但大多数人都没有在comp.lang.python上闲逛。
http://www.scipy.org/Mailing_Lists
scipy.optimize.leastsq()可以被告知返回
估计参数的协方差矩阵(在你的例子中为m和o;我不知道你的想法
r-coeff is)。
-
Robert Kern
我开始相信整个世界都是一个谜,一个无害的谜团,因为我们疯狂地试图解释它,好像它已经变得糟透了/>
一个潜在的事实。
- Umberto Eco
Robert Kern写道:
robert写道:
>有没有准备好了使得numpy / scipy函数能够快速计算X和Y的相关性y = mx + o:
m,m-err,o,o-err,r-coef,r-coef-err?
scipy.optimize.leastsq()可以被告知返回
估计的协方差矩阵参数(你的例子中的m和o;我不知道你的想法
r-coeff是)。
啊,相关系数本身。由于相关系数很奇怪,受限于[-1,1]的b $ b野兽,像你这样的标准高斯错误预计为m-err和o-err不要
应用。不,目前在numpy或
scipy中没有功能可以做一些足够复杂的可靠性。这是一个选项:
http://www.pubmedcentral.nih.gov/art...i?artid=155684
-
Robert Kern
我开始相信整个世界都是一个谜,一个无害的谜团
由于我们疯狂地试图解释它而变得可怕好像它有一个潜在的真相。
- Umberto Eco
robert写道:
是否有numpy / scipy中的现成函数来计算X和Y的相关性y = mx + o:
m,m-err,o,o-err,r-coef,r-coef-err?
当然,这三个参数并不是特别有意义。
如果你的模型真的是y是给出的线性响应x具有正常噪声然后
" y = m * x + o"是正确的,您可以从数据
中获得的所有信息都可以在m和o的估计值以及
估计值的协方差矩阵中找到。另一方面,如果你的模型是(x,y)以双变量分配
正态分布,那么那么y = m * x + o这个模型不是特别好的代表。
。您应该估计
(x,y)的平均向量和协方差矩阵。
除以边际标准偏差之后,你的相关系数将是非对角线项。
这两个模型的区别在于第一个没有限制
分配x。第二个做; x和y边际
分布都需要正常。在第一个模型下,相关性
系数没有意义。
-
Robert Kern
我开始相信整个世界都是一个谜,一个无害的谜团,因为我们疯狂地试图解释它,好像它已经变得糟透了/>
一个潜在的事实。
- Umberto Eco
Is there a ready made function in numpy/scipy to compute the correlation y=mx+o of an X and Y fast:
m, m-err, o, o-err, r-coef,r-coef-err ?
Or a formula to to compute the 3 error ranges?
-robert
PS:
numpy.corrcoef computes only the bare coeff:
>>numpy.corrcoef((0,1,2,3.0),(2,5,6,7.0),)
array([[ 1. , 0.95618289],
[ 0.95618289, 1. ]])
with ints it goes computes wrong:
>>numpy.corrcoef((0,1,2,3),(2,5,6,7),)
array([[ 1. , 0.94491118],
[ 0.94491118, 1. ]])
robert wrote:Is there a ready made function in numpy/scipy to compute the correlation y=mx+o of an X and Y fast:
m, m-err, o, o-err, r-coef,r-coef-err ?numpy and scipy questions are best asked on their lists, not here. There are a
number of people who know the capabilities of numpy and scipy through and
through, but most of them don''t hang out on comp.lang.python.
http://www.scipy.org/Mailing_Lists
scipy.optimize.leastsq() can be told to return the covariance matrix of the
estimated parameters (m and o in your example; I have no idea what you think
r-coeff is).
--
Robert Kern
"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
Robert Kern wrote:robert wrote:>Is there a ready made function in numpy/scipy to compute the correlation y=mx+o of an X and Y fast:
m, m-err, o, o-err, r-coef,r-coef-err ?
scipy.optimize.leastsq() can be told to return the covariance matrix of the
estimated parameters (m and o in your example; I have no idea what you think
r-coeff is).Ah, the correlation coefficient itself. Since correlation coefficients are weird
beasts constrained to [-1, 1], standard gaussian errors like you are expecting
for m-err and o-err don''t apply. No, there''s currently no function in numpy or
scipy that will do something sophisticated enough to be reliable. Here''s an option:
http://www.pubmedcentral.nih.gov/art...i?artid=155684
--
Robert Kern
"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
robert wrote:Is there a ready made function in numpy/scipy to compute the correlation y=mx+o of an X and Y fast:
m, m-err, o, o-err, r-coef,r-coef-err ?And of course, those three parameters are not particularly meaningful together.
If your model is truly "y is a linear response given x with normal noise" then
"y=m*x+o" is correct, and all of the information that you can get from the data
will be found in the estimates of m and o and the covariance matrix of the
estimates.
On the other hand, if your model is that "(x, y) is distributed as a bivariate
normal distribution" then "y=m*x+o" is not a particularly good representation of
the model. You should instead estimate the mean vector and covariance matrix of
(x, y). Your correlation coefficient will be the off-diagonal term after
dividing out the marginal standard deviations.
The difference between the two models is that the first places no restrictions
on the distribution of x. The second does; both the x and y marginal
distributions need to be normal. Under the first model, the correlation
coefficient has no meaning.
--
Robert Kern
"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
这篇关于numpy / scipy:相关性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!