总结一个因素的计数 [英] summarizing counts of a factor with dplyr
本文介绍了总结一个因素的计数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
以下是一些示例输入:
library(dplyr)
df = tbl_df(data.frame(owner = c(0,0,1,1))obs1 = c(quiet,loud ,quiet,loud),obs2 = c(loud,loud,quiet,quiet)))
owner obs1 obs2
1 0 quiet大声
2 0响亮大声
3 1安静安静
4 1大声安静
我正在寻找这样的输出:
out = data.frame(owner = c(0 ,0,1,1),观察= c(obs1,obs2,obs1,obs2),quiet = c(1,0,1,2) = c(1,2,1,0))
所有者观察安静大声
1 0 obs1 1 1
2 0 obs2 0 2
3 1 obs1 1 1
4 1 obs2 2 0
熔化让我中途:
melt = tbl_df(melt(df,id = c(owner)))
er变量
1 0 obs1 quiet
2 0 obs1 loud
3 1 obs1 quiet
4 1 obs1 loud
5 0 obs2 loud
6 0 obs2大声
7 1 obs2 quiet
8 1 obs2 quiet
但是最后的是什么步?如果'value'是一个数字,我会去:
fusion%>%group_by(owner,variable)% %summary(count = sum(value))
非常感谢!
解决方案
您可以使用 tidyr
与 dplyr
库(dplyr)
库(tidyr)
df%> %
收集(观察,Val,obs1:obs2)%>%
group_by(所有者,观察,Val)%>%
总结(n = n())%> %
ungroup()%>%
spread(Val,n,fill = 0)
其中输出
#所有者观察大声安静
#1 0 obs1 1 1
#2 0 obs2 2 0
#3 1 obs1 1 1
#4 1 obs2 0 2
I want to group a data frame by a column (owner) and output a new data frame that has counts of each type of a factor at each observation. The real data frame is fairly large, and there are 10 different factors.
Here is some example input:
library(dplyr)
df = tbl_df(data.frame(owner=c(0,0,1,1), obs1=c("quiet", "loud", "quiet", "loud"), obs2=c("loud", "loud", "quiet", "quiet")))
owner obs1 obs2
1 0 quiet loud
2 0 loud loud
3 1 quiet quiet
4 1 loud quiet
I was looking for output that looks like this:
out = data.frame(owner=c("0", "0", "1", "1"), observation=c("obs1", "obs2", "obs1", "obs2"), quiet=c(1, 0, 1, 2), loud=c(1, 2, 1, 0))
owner observation quiet loud
1 0 obs1 1 1
2 0 obs2 0 2
3 1 obs1 1 1
4 1 obs2 2 0
Melting gets me partway there:
melted = tbl_df(melt(df, id=c("owner")))
owner variable value
1 0 obs1 quiet
2 0 obs1 loud
3 1 obs1 quiet
4 1 obs1 loud
5 0 obs2 loud
6 0 obs2 loud
7 1 obs2 quiet
8 1 obs2 quiet
But what's the last step? If 'value' was a numeric, I'd just go:
melted %>% group_by(owner, variable) %>% summarise(counts=sum(value))
Thanks so much!
解决方案
You could use tidyr
with dplyr
library(dplyr)
library(tidyr)
df %>%
gather(observation, Val, obs1:obs2) %>%
group_by(owner,observation, Val) %>%
summarise(n= n()) %>%
ungroup() %>%
spread(Val, n, fill=0)
which gives the output
# owner observation loud quiet
#1 0 obs1 1 1
#2 0 obs2 2 0
#3 1 obs1 1 1
#4 1 obs2 0 2
这篇关于总结一个因素的计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文