数据帧中每个唯一组合的频率 [英] Frequency of each unique combination in data frame
问题描述
在数据集中(N = 6000),我想分析(15个虚拟)变量的组合出现的频率。
In a dataset (N=6000) I would like to analyse how often combinations of (15 dummy)variables occur.
ID Var1 Var2 Var3 Var15
1 1 0 0 1
2 0 1 1 1
3 1 0 0 0
6000 1 0 0 0
对于此示例,我想看到的是组合1000出现两次,1001出现一次,0111也出现一次。
For this example what I would like to see is that the combination 1000 occurs twice, 1001 occurs once, and 0111 occurs also once.
我能想到的唯一方法是为每个可能的组合计算一个变量...
The only way I can think up is compute a variable for each possible combination...
一种优雅而有效的方法吗?
Is there an elegant and efficient way to do this?
我已经阅读了
如何总结所有可能的变量组合?但这是一个略有不同的问题,并且汇总Tally计数器超越了我的知识(但是,如果这是我的问题的答案,我将通过它来解决)。
I have read through How to summarize all possible combinations of variables? But that is a slightly different question and Aggregating Tally counters transcends my knowledge (but if that is the answer to my question, I will go through it).
推荐答案
您可以像这样使用 count
:
df = read.table(text = "
ID Var1 Var2 Var3 Var15
1 1 0 0 1
2 0 1 1 1
3 1 0 0 0
6000 1 0 0 0
", header=T)
library(dplyr)
df %>% count(Var1, Var2, Var3, Var15)
# # A tibble: 3 x 5
# Var1 Var2 Var3 Var15 n
# <int> <int> <int> <int> <int>
# 1 0 1 1 1 1
# 2 1 0 0 0 2
# 3 1 0 0 1 1
或者如果不想输入(很多)列名,请使用 count _
:
Or use count_
if you don't want to type (many) column names:
input_names = names(df)[-1] # select all column names apart from 1st one
df %>% count_(input_names)
# # A tibble: 3 x 5
# Var1 Var2 Var3 Var15 n
# <int> <int> <int> <int> <int>
# 1 0 1 1 1 1
# 2 1 0 0 0 2
# 3 1 0 0 1 1
如果要对变量进行分组并创建单个(组合)变量,可以执行以下操作:
If you want to group your variables and create a single (combo) variable you can do this:
library(dplyr)
library(tidyr)
input_names = names(df)[-1]
df %>% count_(input_names) %>% unite_("ComboVar",input_names,sep="")
# # A tibble: 3 x 2
# ComboVar n
# * <chr> <int>
# 1 0111 1
# 2 1000 2
# 3 1001 1
这篇关于数据帧中每个唯一组合的频率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!