程序获取分类数据的频率矩阵 [英] Program to obtain frequency matrix of categorical data
本文介绍了程序获取分类数据的频率矩阵的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在处理包含300多个分类特征(我将其分为0和1)的数据. 现在,我需要创建一个特征矩阵,以每个单元中出现联合的频率发生.
I am working on data that contains more than 300 categorical features that I have factored into 0s and 1s. Now, i need to create a matrix of the features to with frequency of joint occurrence in each cell.
最后,我希望创建此频率矩阵的热图.
In the end , I am looking to create a heatmap of this frequency matrix.
所以,我在R中的数据框看起来像这样:
So, my dataframe in R looks like this:
id cat1 cat2 cat3 cat4
156 0 0 1 1
465 1 1 1 0
573 0 1 1 0
我想要的输出是:
cat1 cat2 cat3 ...
cat1 0 1 0
cat2 1 0 2
cat3 1 2 0
.
.
其中每个单元格值表示两个分类变量一起出现的次数 .
where each cell value denotes the number of times the two categorical variables have appeared together.
推荐答案
我们可以使用outer
#Since we have only 0's and 1's in column we can directly use &
fun <- function(x, y) sum(df[, x] & df[, y])
#Get all the cat columns
n <- seq_along(df)[-1]
#Apply function to every combination of columns
mat <- outer(n, n, Vectorize(fun))
#Turn diagonals to 0
diag(mat) <- 0
#Assign rownames and column names
dimnames(mat) <- list(names(df)[n], names(df[n]))
# cat1 cat2 cat3 cat4
#cat1 0 1 1 0
#cat2 1 0 2 0
#cat3 1 2 0 1
#cat4 0 0 1 0
这篇关于程序获取分类数据的频率矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文