数据帧中每个唯一组合的频率 [英] Frequency of each unique combination in data frame

查看：57 发布时间：2020/10/17 1:31:15 r dataframe

本文介绍了数据帧中每个唯一组合的频率的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在数据集中（N = 6000），我想分析（15个虚拟）变量的组合出现的频率。

In a dataset (N=6000) I would like to analyse how often combinations of (15 dummy)variables occur.

ID       Var1        Var2       Var3    Var15

1          1          0          0        1

2          0          1          1        1

3          1          0          0        0

6000       1          0          0        0

对于此示例，我想看到的是组合1000出现两次，1001出现一次，0111也出现一次。

For this example what I would like to see is that the combination 1000 occurs twice, 1001 occurs once, and 0111 occurs also once.

我能想到的唯一方法是为每个可能的组合计算一个变量...

The only way I can think up is compute a variable for each possible combination...

一种优雅而有效的方法吗？

Is there an elegant and efficient way to do this?

我已经阅读了
如何总结所有可能的变量组合？但这是一个略有不同的问题，并且汇总Tally计数器超越了我的知识（但是，如果这是我的问题的答案，我将通过它来解决）。

I have read through How to summarize all possible combinations of variables? But that is a slightly different question and Aggregating Tally counters transcends my knowledge (but if that is the answer to my question, I will go through it).

推荐答案

您可以像这样使用 count ：

df = read.table(text = "
ID       Var1        Var2       Var3    Var15
1          1          0          0        1
2          0          1          1        1
3          1          0          0        0
6000       1          0          0        0
", header=T)

library(dplyr)

df %>% count(Var1, Var2, Var3, Var15)

# # A tibble: 3 x 5
#     Var1  Var2  Var3 Var15     n
#    <int> <int> <int> <int> <int>
# 1     0     1     1     1     1
# 2     1     0     0     0     2
# 3     1     0     0     1     1

或者如果不想输入（很多）列名，请使用 count _ ：

Or use count_ if you don't want to type (many) column names:

input_names = names(df)[-1]  # select all column names apart from 1st one

df %>% count_(input_names)

# # A tibble: 3 x 5
#    Var1  Var2  Var3 Var15     n
#   <int> <int> <int> <int> <int>
# 1     0     1     1     1     1
# 2     1     0     0     0     2
# 3     1     0     0     1     1

如果要对变量进行分组并创建单个（组合）变量，可以执行以下操作：

If you want to group your variables and create a single (combo) variable you can do this:

library(dplyr)
library(tidyr)

input_names = names(df)[-1]

df %>% count_(input_names) %>% unite_("ComboVar",input_names,sep="")

# # A tibble: 3 x 2
#   ComboVar     n
# * <chr>    <int>
# 1 0111         1
# 2 1000         2
# 3 1001         1

这篇关于数据帧中每个唯一组合的频率的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

数据帧中每个唯一组合的频率 [英] Frequency of each unique combination in data frame

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

数据帧中每个唯一组合的频率 [英] Frequency of each unique combination in data frame

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭