使用dplyr将行添加到分组数据? [英] Add rows to grouped data with dplyr?
问题描述
我的数据是像这样的数据的数据框架格式:
data < -
structure (列表(文章=结构(c(1L,1L,3L,1L,1L,1L,
1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,2L,1L, ,1L,2L,1L
),.Label = c(10004,10006,10007),class =factor),
Demand = c(26L,780L,2L ,181L,228L,214L,219L,291L,104L,
72L,155L,237L,182L,148L,52L,227L,2L,355L,2L,432L,
1L,156L) c(2013-W01,2013-W01,2013-W01,2013-W01,
2013-W01,2013-W02,2013-W02 -W02,2013-W02,2013-W02,2013-W03,2013-W03,2013-W03,2013-W03,
-W03,2013-W04,2013-W04,2013-W04,2013-W04,
2013-W04,2013-W04 ),.Names = c(Article,
Demand,Week),class =data.frame,row.names = c(NA,-22L))
我想按周和文章总结需求列。为此,我使用:
库(dplyr)
pre>
WeekSums< -
data%> ;%
group_by(文章,周)%>%
总结(
WeekDemand = sum(Demand)
)
但是由于某些文章在某些星期内未出售,因此每篇文章的行数不同(只有星期的销售额显示在WeekSums数据框中)。如何调整我的数据,以便每篇文章的行数相同(每周一次),包括需求为零的星期?
输出应该看起来像这个:
文章周WeekDemand
1 10004 2013-W01 1215
2 10004 2013-W02 900
3 10004 2013-W03 774
4 10004 2013-W04 1170
5 10006 2013-W01 0
6 10006 2013-W02 0
7 10006 2013-W03 0
8 10006 2013-W04 5
9 10007 2013-W01 2
10 10007 2013-W02 0
11 10007 2013-W03 0
12 10007 2013-W04 0
我尝试过
WeekSums%>%
group_by(Article)%>%
if(n()< 4)rep(rbind(c(Article,NA,NA)), n())
但这不行。在我的原始方法中,我通过将每周1-4的数据帧与每个文章的rawdata文件合并来解决了这个问题。这样,我每篇文章都有4周(行),但是使用for循环的实现是非常低效的,所以我试图用dplyr(或任何其他更有效的包/函数)做同样的事情。任何建议将非常感谢!
解决方案没有dplyr可以这样做:
> as.data.frame(xtabs(需求〜周+文章,数据))
周文章Freq
1 2013-W01 10004 1215
2 2013-W02 10004 900
3 2013 -W03 10004 774
4 2013-W04 10004 1170
5 2013-W01 10006 0
6 2013-W02 10006 0
7 2013-W03 10006 0
8 2013 -W04 10006 5
9 2013-W01 10007 2
10 2013-W02 10007 0
11 2013-W03 10007 0
12 2013-W04 10007 0
,这可以重写为dplyr管道,如下所示:
data%>%xtabs(formula = Demand〜Week + Article)%>%as.data.frame()
如果需要广泛的解决方案,最终可能会忽略
as.data.frame()
/ p>My data is in a data.frame format like this sample data:
data <- structure(list(Article = structure(c(1L, 1L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 1L ), .Label = c("10004", "10006", "10007"), class = "factor"), Demand = c(26L, 780L, 2L, 181L, 228L, 214L, 219L, 291L, 104L, 72L, 155L, 237L, 182L, 148L, 52L, 227L, 2L, 355L, 2L, 432L, 1L, 156L), Week = c("2013-W01", "2013-W01", "2013-W01", "2013-W01", "2013-W01", "2013-W02", "2013-W02", "2013-W02", "2013-W02", "2013-W02", "2013-W03", "2013-W03", "2013-W03", "2013-W03", "2013-W03", "2013-W04", "2013-W04", "2013-W04", "2013-W04", "2013-W04", "2013-W04", "2013-W04")), .Names = c("Article", "Demand", "Week"), class = "data.frame", row.names = c(NA, -22L))
I would like to summarize the demand column by week and article. To do this, I use:
library(dplyr) WeekSums <- data %>% group_by(Article, Week) %>% summarize( WeekDemand = sum(Demand) )
But because some articles were not sold in certain weeks, the number of rows per article differs (only weeks with sales are shown in the WeekSums dataframe). How could I adjust my data so that each article has the same number of rows (one for each week), including weeks with 0 demand?
The output should then look like this:
Article Week WeekDemand 1 10004 2013-W01 1215 2 10004 2013-W02 900 3 10004 2013-W03 774 4 10004 2013-W04 1170 5 10006 2013-W01 0 6 10006 2013-W02 0 7 10006 2013-W03 0 8 10006 2013-W04 5 9 10007 2013-W01 2 10 10007 2013-W02 0 11 10007 2013-W03 0 12 10007 2013-W04 0
I tried
WeekSums %>% group_by(Article) %>% if(n()< 4) rep(rbind(c(Article,NA,NA)), 4 - n() )
but this doesn’t work. In my original approach, I resolved this problem by merging a dataframe of week numbers 1-4 with my rawdata file for each article. That way, I got 4 weeks (rows) per article, but the implementation with a for loop is very inefficient and so I’m trying to do the same with dplyr (or any other more efficient package/function). Any suggestions would be much appreciated!
解决方案Without dplyr it can be done like this:
> as.data.frame(xtabs(Demand ~ Week + Article, data)) Week Article Freq 1 2013-W01 10004 1215 2 2013-W02 10004 900 3 2013-W03 10004 774 4 2013-W04 10004 1170 5 2013-W01 10006 0 6 2013-W02 10006 0 7 2013-W03 10006 0 8 2013-W04 10006 5 9 2013-W01 10007 2 10 2013-W02 10007 0 11 2013-W03 10007 0 12 2013-W04 10007 0
and this can be rewritten as a dplyr pipeline like this:
data %>% xtabs(formula = Demand ~ Week + Article) %>% as.data.frame()
The
as.data.frame()
at the end could be omitted if a wide form solution were desired.这篇关于使用dplyr将行添加到分组数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!