在Python中,如何计算两个数据数组之间的相关性和统计显着性? [英] In Python, how can I calculate correlation and statistical significance between two arrays of data?
问题描述
我有一组包含两个相等长的数据数组的数据,或者我可以制作一个包含两个项目的数组,并且我想计算由数据表示的相关性和统计显着性(它们可能紧密相关,或可能没有统计学上的显着相关性.
I have sets of data with two equally long arrays of data, or I can make an array of two-item entries, and I would like to calculate the correlation and statistical significance represented by the data (which may be tightly correlated, or may have no statistically significant correlation).
我正在用Python编程,并已安装scipy和numpy.我查看并发现在Python中计算Pearson相关性和重要性,但是似乎希望对数据进行操作,使其落在指定范围内.
I am programming in Python and have scipy and numpy installed. I looked and found Calculating Pearson correlation and significance in Python, but that seems to want the data to be manipulated so it falls into a specified range.
我假设让scipy或numpy给我两个数组的相关性和统计意义的正确方法是什么?
What is the proper way to, I assume, ask scipy or numpy to give me the correlation and statistical significance of two arrays?
推荐答案
如果要计算Pearson相关系数,则scipy.stats.pearsonr
是可行的方法;不过,意义仅对较大的数据集有意义.此功能不需要将数据处理到指定范围内.相关值落在区间[-1,1]
中,也许那是困惑?
If you want to calculate the Pearson Correlation Coefficient, then scipy.stats.pearsonr
is the way to go; although, the significance is only meaningful for larger data sets. This function does not require the data to be manipulated to fall into a specified range. The value for the correlation falls in the interval [-1,1]
, perhaps that was the confusion?
如果重要性不是非常重要,则可以使用numpy.corrcoef()
.
If the significance is not terribly important, you can use numpy.corrcoef()
.
马氏距离确实考虑了两个阵列之间的相关性,但是它提供的是距离度量,而不是相关性. (从数学上讲,马氏距离并不是真正的距离函数;尽管如此,它仍可以在某些情况下使用,从而具有很大的优势.)
The Mahalanobis distance does take into account the correlation between two arrays, but it provides a distance measure, not a correlation. (Mathematically, the Mahalanobis distance is not a true distance function; nevertheless, it can be used as such in certain contexts to great advantage.)
这篇关于在Python中,如何计算两个数据数组之间的相关性和统计显着性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!