如何在具有不同列名的两个数据框之间执行关联 [英] How to perform Correlation between two dataframes with different column names
问题描述
我在数据帧df1中有一组列(col1,col2,col3) 我在数据框df2中有另一组列(col4,col5,col6) 假设这两个数据帧具有相同的行数.
I have a set of columns (col1,col2,col3) in dataframe df1 I have another set of columns (col4,col5,col6) in dataframe df2 Assume this two dataframes has the same number of rows.
如何生成在df1和df2之间进行成对相关的相关表?
How do I generate a correlation table that do pairwise correlation between df1 and df2?
表格看起来像
col1 col2 col3
col4 .. .. ..
col5 .. .. ..
col6 .. .. ..
我使用df1.corrwith(df2)
,它似乎没有按照要求生成表.
I use df1.corrwith(df2)
, it does not seem to generate the table as required.
我在>如何找到答案可以检查两个数据集的匹配列之间的相关性?,但是主要区别在于col名称不匹配.
I have seen the answer at How to check correlation between matching columns of two data sets?, but the main difference is that the col names does not matched.
推荐答案
pandas
快速又肮脏
pd.concat([df1, df2], axis=1, keys=['df1', 'df2']).corr().loc['df2', 'df1']
numpy
干净
def corr(df1, df2):
n = len(df1)
v1, v2 = df1.values, df2.values
sums = np.multiply.outer(v2.sum(0), v1.sum(0))
stds = np.multiply.outer(v2.std(0), v1.std(0))
return pd.DataFrame((v2.T.dot(v1) - sums / n) / stds / n,
df2.columns, df1.columns)
corr(df1, df2)
示例
example
df1 = pd.DataFrame(np.random.rand(10, 4), columns=list('abcd'))
df2 = pd.DataFrame(np.random.rand(10, 3), columns=list('xyz'))
pd.concat([df1, df2], axis=1, keys=['df1', 'df2']).corr().loc['df2', 'df1']
a b c d
x 0.235624 0.844665 -0.647962 0.535562
y 0.357994 0.462007 0.205863 0.424568
z 0.688853 0.350318 0.132357 0.687038
corr(df1, df2)
a b c d
x 0.235624 0.844665 -0.647962 0.535562
y 0.357994 0.462007 0.205863 0.424568
z 0.688853 0.350318 0.132357 0.687038
这篇关于如何在具有不同列名的两个数据框之间执行关联的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!