程序获取分类数据的频率矩阵 [英] Program to obtain frequency matrix of categorical data

查看:100
本文介绍了程序获取分类数据的频率矩阵的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理包含300多个分类特征(我将其分为0和1)的数据. 现在,我需要创建一个特征矩阵,以每个单元中出现联合的频率发生.

I am working on data that contains more than 300 categorical features that I have factored into 0s and 1s. Now, i need to create a matrix of the features to with frequency of joint occurrence in each cell.

最后,我希望创建此频率矩阵的热图.

In the end , I am looking to create a heatmap of this frequency matrix.

所以,我在R中的数据框看起来像这样:

So, my dataframe in R looks like this:

id cat1 cat2 cat3 cat4
156   0    0    1    1
465   1    1    1    0
573   0    1    1    0

我想要的输出是:

      cat1 cat2  cat3 ...
cat1   0     1      0
cat2    1     0     2
cat3    1     2     0
  .
  .

其中每个单元格值表示两个分类变量一起出现的次数 .

where each cell value denotes the number of times the two categorical variables have appeared together.

推荐答案

我们可以使用outer

#Since we have only 0's and 1's in column we can directly use &
fun <- function(x, y) sum(df[, x] & df[, y])

#Get all the cat columns
n <- seq_along(df)[-1]
#Apply function to every combination of columns
mat <- outer(n, n, Vectorize(fun))
#Turn diagonals to 0
diag(mat) <- 0
#Assign rownames and column names
dimnames(mat) <- list(names(df)[n], names(df[n]))

#     cat1 cat2 cat3 cat4
#cat1    0    1    1    0
#cat2    1    0    2    0
#cat3    1    2    0    1
#cat4    0    0    1    0

这篇关于程序获取分类数据的频率矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆