如何执行分类列之间的关联 [英] How to perform correlation between categorical columns

查看:39
本文介绍了如何执行分类列之间的关联的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在数据帧df1中有一组列(col1,col2,col3) 我在数据框df2中有另一组列(col4,col5,col6) 假设这两个数据帧具有相同的行数.

I have a set of columns (col1,col2,col3) in dataframe df1 I have another set of columns (col4,col5,col6) in dataframe df2 Assume this two dataframes has the same number of rows.

如何生成在df1和df2之间进行成对相关的相关表?

How do I generate a correlation table that do pairwise correlation between df1 and df2?

表格看起来像

    col1 col2 col3
col4 ..   ..   ..
col5 ..   ..   ..
col6 ..   ..   ..

我使用df1.corrwith(df2),它似乎没有按照要求生成表.

I use df1.corrwith(df2), it does not seem to generate the table as required.

我在这里问了类似的问题: 如何在具有不同列的两个数据框之间执行关联名称 但是现在我正在处理分类列.

I have a asked a similar question here: How to perform Correlation between two dataframes with different column names but now I am dealing with categorical columns.

如果不能直接比较,是否有标准方法使其可比较(例如使用get_dummies)?并且这是一种自动处理所有字段(假设所有字段都是分类的)并计算其相关性的更快方法吗?

If it is not comparable directly, is there a standard way to make it comparable (like using get_dummies)? and is that a faster way to automatically process all fields (assume all are categorical) and calculate their correlation?

推荐答案

您正确地需要pd.get_dummies来获得相关性.在下面,我将使用两个分类列创建一些虚假数据,然后使用corrwith

You are correct that pd.get_dummies would be needed to get the correlation. Below, I will create some fake data with two categorical columns and then use corrwith

df = pd.DataFrame({'col1':np.random.choice(list('abcde'),100),
                  'col2':np.random.choice(list('xyz'),100)}, dtype='category')
df1 = pd.DataFrame({'col1':np.random.choice(list('abcde'),100),
                   'col2':np.random.choice(list('xyz'),100)}, dtype='category')

dfa = pd.get_dummies(df)
dfb = pd.get_dummies(df1)
dfa.corrwith(dfb)

col1_a   -0.057735
col1_b    0.002513
col1_c    0.137956
col1_d   -0.095050
col1_e   -0.114022
col2_x    0.022568
col2_y   -0.081699
col2_z   -0.128350

这篇关于如何执行分类列之间的关联的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆