总结一个因素的计数 [英] summarizing counts of a factor with dplyr

查看:72
本文介绍了总结一个因素的计数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想通过列(所有者)对数据帧进行分组,并在每次观察时输出一个具有每种类型因子的计数的新数据帧。真实的数据框架相当大,有10个不同的因素。



以下是一些示例输入:

  library(dplyr)
df = tbl_df(data.frame(owner = c(0,0,1,1))obs1 = c(quiet,loud ,quiet,loud),obs2 = c(loud,loud,quiet,quiet)))

owner obs1 obs2
1 0 quiet大声
2 0响亮大声
3 1安静安静
4 1大声安静

我正在寻找这样的输出:

  out = data.frame(owner = c(0 ,0,1,1),观察= c(obs1,obs2,obs1,obs2),quiet = c(1,0,1,2) = c(1,2,1,0))

所有者观察安静大声
1 0 obs1 1 1
2 0 obs2 0 2
3 1 obs1 1 1
4 1 obs2 2 0

熔化让我中途:

  melt = tbl_df(melt(df,id = c(owner)))

er变量
1 0 obs1 quiet
2 0 obs1 loud
3 1 obs1 quiet
4 1 obs1 loud
5 0 obs2 loud
6 0 obs2大声
7 1 obs2 quiet
8 1 obs2 quiet

但是最后的是什么步?如果'value'是一个数字,我会去:

  fusion%>%group_by(owner,variable)%  %summary(count = sum(value))

非常感谢!

解决方案

您可以使用 tidyr dplyr

 库(dplyr)
库(tidyr)

df%> %
收集(观察,Val,obs1:obs2)%>%
group_by(所有者,观察,Val)%>%
总结(n = n())%> %
ungroup()%>%
spread(Val,n,fill = 0)

其中输出

 #所有者观察大声安静
#1 0 obs1 1 1
#2 0 obs2 2 0
#3 1 obs1 1 1
#4 1 obs2 0 2


I want to group a data frame by a column (owner) and output a new data frame that has counts of each type of a factor at each observation. The real data frame is fairly large, and there are 10 different factors.

Here is some example input:

library(dplyr)
df = tbl_df(data.frame(owner=c(0,0,1,1), obs1=c("quiet", "loud", "quiet", "loud"), obs2=c("loud", "loud", "quiet", "quiet")))

  owner  obs1  obs2
1     0 quiet  loud
2     0  loud  loud
3     1 quiet quiet
4     1  loud quiet

I was looking for output that looks like this:

out = data.frame(owner=c("0", "0", "1", "1"), observation=c("obs1", "obs2", "obs1", "obs2"), quiet=c(1, 0, 1, 2), loud=c(1, 2, 1, 0))

  owner observation quiet loud
1     0        obs1     1    1
2     0        obs2     0    2
3     1        obs1     1    1
4     1        obs2     2    0

Melting gets me partway there:

melted = tbl_df(melt(df, id=c("owner")))

  owner variable value
1     0     obs1 quiet
2     0     obs1  loud
3     1     obs1 quiet
4     1     obs1  loud
5     0     obs2  loud
6     0     obs2  loud
7     1     obs2 quiet
8     1     obs2 quiet

But what's the last step? If 'value' was a numeric, I'd just go:

melted %>% group_by(owner, variable) %>% summarise(counts=sum(value))

Thanks so much!

解决方案

You could use tidyr with dplyr

library(dplyr)
library(tidyr)

 df %>%
 gather(observation, Val, obs1:obs2) %>% 
 group_by(owner,observation, Val) %>% 
 summarise(n= n()) %>%
 ungroup() %>%
 spread(Val, n, fill=0)

which gives the output

  #    owner observation loud quiet
  #1     0        obs1    1     1
  #2     0        obs2    2     0
  #3     1        obs1    1     1
  #4     1        obs2    0     2

这篇关于总结一个因素的计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆