计算一个DataFrame的所有列与另一个DataFrame的所有列之间的相关性? [英] Calculate correlation between all columns of a DataFrame and all columns of another DataFrame?
问题描述
我有一个充满股票收益的DataFrame对象stocks
.我还有另一个充满行业回报的DataFrame对象industries
.我想找到每种股票与每个行业的相关性.
I have a DataFrame object stocks
filled with stock returns. I have another DataFrame object industries
filled with industry returns. I want to find each stock's correlation with each industry.
import numpy as np
np.random.seed(123)
df1=pd.DataFrame( {'s1':np.random.randn(10000), 's2':np.random.randn(10000) } )
df2=pd.DataFrame( {'i1':np.random.randn(10000), 'i2':np.random.randn(10000) } )
执行此操作的昂贵方法是合并两个DataFrame对象,计算相关性,然后丢弃所有库存与库存之间以及行业与行业之间的相关性.有没有更有效的方法可以做到这一点?
The expensive way to do this is to merge the two DataFrame objects, calculate correlation, and then throw out all the stock to stock and industry to industry correlations. Is there a more efficient way to do this?
推荐答案
这是一种单列代码,它在列上使用apply
并避免了嵌套的for循环.主要优点是apply
将结果构建在DataFrame中.
And here's a one-liner that uses apply
on the columns and avoids the nested for loops. The main benefit is that apply
builds the result in a DataFrame.
df1.apply(lambda s: df2.corrwith(s))
这篇关于计算一个DataFrame的所有列与另一个DataFrame的所有列之间的相关性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!