使用ddply将函数应用于一组行 [英] Using ddply to apply a function to a group of rows
问题描述
我经常使用ddply,但是我不认为自己是专家.我有一个数据帧(df),其分组变量"Group"的值为"A","B"和"C",而要汇总的变量"Var"为数字值.如果我使用
I use ddply quite a bit but I do not consider myself an expert. I have a data frame (df) with grouping variable "Group" which has values of "A", "B" and "C" and the variable to summarize, "Var" has numeric values. If I use
ddply(df, .(Group), summarize, mysum=sum(Var))
然后我得到每个A,B和C的总和,这是正确的.但是我想做的是对分组在数据帧中的Group变量的每个分组求和.例如,如果数据框具有
then I get the sum of each A, B and C, which is correct. But what I want to do is to sum over each grouping of the Group variables as they are arranged in the data frame. For instance, if the data frame has
Group Var
A 1.3
A 1.2
A 0.4
B 0.3
B 1.3
C 1.5
C 1.7
C 1.9
A 2.1
A 2.4
B 6.7
所需结果
A 2.9
B 1.6
C 5.1
A 4.5
B 6.7
因此,所需的输出在Group变量的每个分组上执行数学函数,而不是在单个Group变量的所有实例上执行数学函数.可以在ddply中完成吗?
So, the desired output performs a mathematical function on each grouping of the Group variables, rather than on all instances of the individual Group variables. Can this be done in ddply?
数据
dat <- structure(list(Group = c("A", "A", "A", "B", "B", "C", "C", "C", "A", "A", "B"),
Var = c(1.3, 1.2, 0.4, 0.3, 1.3, 1.5, 1.7, 1.9, 2.1, 2.4, 6.7)),
.Names = c("Group", "Var"), class = "data.frame", row.names = c(NA, -11L))
推荐答案
这是使用最近从data.table
v1.9.6开始实施的rleid()
函数执行此操作的一种方法.参见#686 .
Here's one way of doing this using the recently implemented rleid()
function from data.table
v1.9.6. See #686.
这将根据需要生成分组ID:
This generates the grouping ids as required:
require(data.table) ## v1.9.6+
DT = as.data.table(dat)
rleid(DT$Group)
# [1] 1 1 1 2 2 3 3 3 4 4 5
我们可以直接使用它进行汇总,如下所示:
We can use this directly to aggregate as follows:
DT[, .(sum=sum(Var)), by=.(Group, rleid(Group))]
# Group rleid sum
# 1: A 1 2.9
# 2: B 2 1.6
# 3: C 3 5.1
# 4: A 4 4.5
# 5: B 5 6.7
HTH
这篇关于使用ddply将函数应用于一组行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!