使用ddply将函数应用于一组行 [英] Using ddply to apply a function to a group of rows

查看:68
本文介绍了使用ddply将函数应用于一组行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我经常使用ddply,但是我不认为自己是专家.我有一个数据帧(df),其分组变量"Group"的值为"A","B"和"C",而要汇总的变量"Var"为数字值.如果我使用

I use ddply quite a bit but I do not consider myself an expert. I have a data frame (df) with grouping variable "Group" which has values of "A", "B" and "C" and the variable to summarize, "Var" has numeric values. If I use

ddply(df, .(Group), summarize, mysum=sum(Var))

然后我得到每个A,B和C的总和,这是正确的.但是我想做的是对分组在数据帧中的Group变量的每个分组求和.例如,如果数据框具有

then I get the sum of each A, B and C, which is correct. But what I want to do is to sum over each grouping of the Group variables as they are arranged in the data frame. For instance, if the data frame has

Group    Var
A        1.3
A        1.2
A        0.4
B        0.3
B        1.3
C        1.5
C        1.7
C        1.9
A        2.1
A        2.4
B        6.7

所需结果

A        2.9
B        1.6
C        5.1
A        4.5
B        6.7

因此,所需的输出在Group变量的每个分组上执行数学函数,而不是在单个Group变量的所有实例上执行数学函数.可以在ddply中完成吗?

So, the desired output performs a mathematical function on each grouping of the Group variables, rather than on all instances of the individual Group variables. Can this be done in ddply?

数据

dat <- structure(list(Group = c("A", "A", "A", "B", "B", "C", "C", "C", "A", "A", "B"),
                      Var = c(1.3, 1.2, 0.4, 0.3, 1.3, 1.5, 1.7, 1.9, 2.1, 2.4, 6.7)),
                 .Names = c("Group", "Var"), class = "data.frame", row.names = c(NA, -11L))

推荐答案

这是使用最近从data.table v1.9.6开始实施的rleid()函数执行此操作的一种方法.参见#686 .

Here's one way of doing this using the recently implemented rleid() function from data.table v1.9.6. See #686.

这将根据需要生成分组ID:

This generates the grouping ids as required:

require(data.table) ## v1.9.6+
DT = as.data.table(dat)
rleid(DT$Group)
# [1] 1 1 1 2 2 3 3 3 4 4 5

我们可以直接使用它进行汇总,如下所示:

We can use this directly to aggregate as follows:

DT[, .(sum=sum(Var)), by=.(Group, rleid(Group))]
#    Group rleid sum
# 1:     A     1 2.9
# 2:     B     2 1.6
# 3:     C     3 5.1
# 4:     A     4 4.5
# 5:     B     5 6.7

HTH

这篇关于使用ddply将函数应用于一组行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆