累加使用ddply [英] cumsum using ddply

查看:85
本文介绍了累加使用ddply的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在ddply的级别上使用分组方式,如果更容易的话,可以使用聚合.我不确定如何执行此操作,因为我需要使用cumsum作为聚合函数.这是我的数据:

I need to use group by in levels with ddply or aggregate if that's easier. I am not really sure how to do this as I need to use cumsum as my aggregate function. This is what my data looks like:

level1      level2  hour     product 
A           tea     0          7
A           tea     1          2
A           tea     2          9
A           coffee  17         7
A           coffee  18         2
A           coffee  20         4
B           coffee  0          2
B           coffee  1          3
B           coffee  2          4
B           tea     21         3
B           tea     22         1

预期输出:

A     tea     0   7
A     tea     1   9
A     tea     2   18
A     coffee  17  7
A     coffee  18  9
A     coffee  20  13
B     coffee  0   2
B     coffee  1   5
B     coffee  2   9
B     tea     21  3
B     tea     22  4

我尝试使用

ddply(dd,c("level1","level2","hour"),summarise,cumsum(product))

但是我不能总结一下,因为小时列正用于分组依据,并且被分组..我认为..我不确定我是否完全理解聚合在这里的工作方式.有什么办法可以使用聚合或ddply获得所需的输出?

but that doesn't sum up which I think is because the hour column is being used for group by and its being split by that..I think.. I am not sure I completely understand how aggregate works here. Is there any way I could get the required output using aggregate or ddply?

推荐答案

这是使用avewithin的base R中的解决方案:

Here's a solution in base R using ave and within:

within(mydf, {
  cumsumProduct <- ave(product, level1, level2, FUN = cumsum)
})
#    level1 level2 hour product cumsumProduct
# 1       A    tea    0       7             7
# 2       A    tea    1       2             9
# 3       A    tea    2       9            18
# 4       A coffee   17       7             7
# 5       A coffee   18       2             9
# 6       A coffee   20       4            13
# 7       B coffee    0       2             2
# 8       B coffee    1       3             5
# 9       B coffee    2       4             9
# 10      B    tea   21       3             3
# 11      B    tea   22       1             4

当然,如果要删除现有的product列,则可以将命令更改为以下内容,以覆盖当前的"product"列:

Of course, if you wanted to drop the existing product column, you can change the command to the following to overwrite the current "product" column:

within(mydf, {
  product <- ave(product, level1, level2, FUN = cumsum)
})

您当前的方法部分无效,因为您已将小时"作为分组变量之一.换句话说,它看到的是"A +茶+ 0"的组合与"A +茶+ 1"的区别,但是从您想要的输出来看,您似乎只是希望"A +茶"的组合是组.

Your current approach doesn't work in part because you've included "hour" as one of your grouping variables. In other words, it is seeing the combination of "A + tea + 0" as different from "A + tea + 1", but from your desired output, you seem to simply want the combination of "A + tea" to be the group.

aggregate将无法按预期工作,因为它将所有内容压缩为data.frame,且行数与"level1"和"level2"的唯一组合的数量相同,在这种情况下为4行.汇总列将为list.这些值是正确的,但用处不大.

aggregate won't work as you expect, because it will condense everything into a data.frame with the same number of rows as the number of unique combinations of "level1" and "level2", in this case, 4 rows. The aggregated column would be a list. The values would be correct, but it would be less useful.

这是aggregate及其输出:

> aggregate(product ~ level1 + level2, mydf, cumsum)
  level1 level2  product
1      A coffee 7, 9, 13
2      B coffee  2, 5, 9
3      A    tea 7, 9, 18
4      B    tea     3, 4

这篇关于累加使用ddply的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆