如何汇总分解为多列的数据? [英] How do I summarize data that is broken into many columns?

查看:12
本文介绍了如何汇总分解为多列的数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据集,其中包含选择尽可能多的应用问题的答案(&Q;),每个可能的答案都在单独的列中。那么,假设我们的问题是您可以接受什么颜色的衬衫?它看起来是这样的:

id    Q3_Red Q3_Blue Q3_Green    Q3_Purple
9                    
8                    Green       Purple
7                    Green     
6     Red               
5                                Purple
4            Blue          
3            Blue                Purple
2     Red    Blue    Green     
1     Red                        Purple
10    Red                        Purple

您可以使用以下命令将其制作为实际数据框:

tmp <- data.frame("id" = c(009,008,007,006,005,004,003,002,001,010), "Q3_Red" = c("","","","Red","","","","Red","Red","Red"), "Q3_Blue" = c("","","","","","Blue","Blue","Blue","",""),
  "Q3_Green" = c("","Green","Green","","","","","Green","",""),
  "Q3_Purple" = c("","Purple","","","Purple","","Purple","","Purple","Purple")
)

我想用每个答案的计数来总结它,例如

Red     4
Blue    3
Green   3
Purple  5

我可以用tmp %>% count(Q3_Red)这样的方法计算每个数据框的数量,并将它们组织到各自的数据框中,但似乎必须有一种方法可以使用重塑函数来一举完成这项工作。我看过gather()spread(),但我想不通如何将tidyrcount()组合在一起。

推荐答案

dplyrtidyr您的朋友在这里吗:

library(dplyr)
library(tidyr)
tmp %>% 
  pivot_longer(cols = -id, values_to = "response") %>%   # pivot all columns but id
  filter(response != "") %>%        # remove blanks
  group_by(response) %>%            # group by response
  summarize(count = n())            # summarize and count
# A tibble: 4 x 2
  value  count
  <chr>  <int>
1 Blue       3
2 Green      3
3 Purple     5
4 Red        4

这篇关于如何汇总分解为多列的数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆