如何在具有不同列名的两个数据框之间执行关联 [英] How to perform Correlation between two dataframes with different column names

查看:53
本文介绍了如何在具有不同列名的两个数据框之间执行关联的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在数据帧df1中有一组列(col1,col2,col3) 我在数据框df2中有另一组列(col4,col5,col6) 假设这两个数据帧具有相同的行数.

I have a set of columns (col1,col2,col3) in dataframe df1 I have another set of columns (col4,col5,col6) in dataframe df2 Assume this two dataframes has the same number of rows.

如何生成在df1和df2之间进行成对相关的相关表?

How do I generate a correlation table that do pairwise correlation between df1 and df2?

表格看起来像

    col1 col2 col3
col4 ..   ..   ..
col5 ..   ..   ..
col6 ..   ..   ..

我使用df1.corrwith(df2),它似乎没有按照要求生成表.

I use df1.corrwith(df2), it does not seem to generate the table as required.

我在>如何找到答案可以检查两个数据集的匹配列之间的相关性?,但是主要区别在于col名称不匹配.

I have seen the answer at How to check correlation between matching columns of two data sets?, but the main difference is that the col names does not matched.

推荐答案

pandas快速又肮脏

pd.concat([df1, df2], axis=1, keys=['df1', 'df2']).corr().loc['df2', 'df1']

numpy干净

def corr(df1, df2):
    n = len(df1)
    v1, v2 = df1.values, df2.values
    sums = np.multiply.outer(v2.sum(0), v1.sum(0))
    stds = np.multiply.outer(v2.std(0), v1.std(0))
    return pd.DataFrame((v2.T.dot(v1) - sums / n) / stds / n,
                        df2.columns, df1.columns)

corr(df1, df2)


示例


example

df1 = pd.DataFrame(np.random.rand(10, 4), columns=list('abcd'))

df2 = pd.DataFrame(np.random.rand(10, 3), columns=list('xyz'))


pd.concat([df1, df2], axis=1, keys=['df1', 'df2']).corr().loc['df2', 'df1']

          a         b         c         d
x  0.235624  0.844665 -0.647962  0.535562
y  0.357994  0.462007  0.205863  0.424568
z  0.688853  0.350318  0.132357  0.687038


corr(df1, df2)

          a         b         c         d
x  0.235624  0.844665 -0.647962  0.535562
y  0.357994  0.462007  0.205863  0.424568
z  0.688853  0.350318  0.132357  0.687038

这篇关于如何在具有不同列名的两个数据框之间执行关联的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆