相关矩阵的均值- pandas 数据帧 [英] Mean of a correlation matrix - pandas data fram
本文介绍了相关矩阵的均值- pandas 数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我在pandas python DataFrame中有一个大的相关矩阵:df(342,342).
I have a large correlation matrix in a pandas python DataFrame: df (342, 342).
如何计算上三角中不包括沿对角线的1的所有数字的均值,sd等?
How do I take the mean, sd, etc. of all of the numbers in the upper triangle not including the 1's along the diagonal?
谢谢.
推荐答案
另一行可能的答案:
In [1]: corr
Out[1]:
a b c d e
a 1.000000 0.022246 0.018614 0.022592 0.008520
b 0.022246 1.000000 0.033029 0.049714 -0.008243
c 0.018614 0.033029 1.000000 -0.016244 0.049010
d 0.022592 0.049714 -0.016244 1.000000 -0.015428
e 0.008520 -0.008243 0.049010 -0.015428 1.000000
In [2]: corr.values[np.triu_indices_from(corr.values,1)].mean()
Out[2]: 0.016381
添加了性能指标
我的解决方案的性能:
In [3]: %timeit corr.values[np.triu_indices_from(corr.values,1)].mean()
10000 loops, best of 3: 48.1 us per loop
Theodros Zelleke单线解决方案的性能:
Performance of Theodros Zelleke's one-line solution:
In [4]: %timeit corr.unstack().ix[zip(*np.triu_indices_from(corr, 1))].mean()
1000 loops, best of 3: 823 us per loop
DSM解决方案的性能:
Performance of DSM's solution:
In [5]: def method1(df):
...: df2 = df.copy()
...: df2.values[np.tril_indices_from(df2)] = np.nan
...: return df2.unstack().mean()
...:
In [5]: %timeit method1(corr)
1000 loops, best of 3: 242 us per loop
这篇关于相关矩阵的均值- pandas 数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文