数据帧中每个唯一组合的频率 [英] Frequency of each unique combination in data frame

查看:57
本文介绍了数据帧中每个唯一组合的频率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在数据集中(N = 6000),我想分析(15个虚拟)变量的组合出现的频率。

In a dataset (N=6000) I would like to analyse how often combinations of (15 dummy)variables occur.

ID       Var1        Var2       Var3    Var15

1          1          0          0        1

2          0          1          1        1

3          1          0          0        0

6000       1          0          0        0

对于此示例,我想看到的是组合1000出现两次,1001出现一次,0111也出现一次。

For this example what I would like to see is that the combination 1000 occurs twice, 1001 occurs once, and 0111 occurs also once.

我能想到的唯一方法是为每个可能的组合计算一个变量...

The only way I can think up is compute a variable for each possible combination...

一种优雅而有效的方法吗?

Is there an elegant and efficient way to do this?

我已经阅读了
如何总结所有可能的变量组合?但这是一个略有不同的问题,并且汇总Tally计数器超越了我的知识(但是,如果这是我的问题的答案,我将通过它来解决)。

I have read through How to summarize all possible combinations of variables? But that is a slightly different question and Aggregating Tally counters transcends my knowledge (but if that is the answer to my question, I will go through it).

推荐答案

您可以像这样使用 count

df = read.table(text = "
ID       Var1        Var2       Var3    Var15
1          1          0          0        1
2          0          1          1        1
3          1          0          0        0
6000       1          0          0        0
", header=T)

library(dplyr)

df %>% count(Var1, Var2, Var3, Var15)

# # A tibble: 3 x 5
#     Var1  Var2  Var3 Var15     n
#    <int> <int> <int> <int> <int>
# 1     0     1     1     1     1
# 2     1     0     0     0     2
# 3     1     0     0     1     1

或者如果不想输入(很多)列名,请使用 count _

Or use count_ if you don't want to type (many) column names:

input_names = names(df)[-1]  # select all column names apart from 1st one

df %>% count_(input_names)

# # A tibble: 3 x 5
#    Var1  Var2  Var3 Var15     n
#   <int> <int> <int> <int> <int>
# 1     0     1     1     1     1
# 2     1     0     0     0     2
# 3     1     0     0     1     1

如果要对变量进行分组并创建单个(组合)变量,可以执行以下操作:

If you want to group your variables and create a single (combo) variable you can do this:

library(dplyr)
library(tidyr)

input_names = names(df)[-1]

df %>% count_(input_names) %>% unite_("ComboVar",input_names,sep="")

# # A tibble: 3 x 2
#   ComboVar     n
# * <chr>    <int>
# 1 0111         1
# 2 1000         2
# 3 1001         1

这篇关于数据帧中每个唯一组合的频率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆