在 r 中使用 dplyr 建立组之间的差异 [英] Build difference between groups with dplyr in r
问题描述
我正在使用 dplyr,我想知道是否可以在一行中计算组之间的差异.在下面的小例子中,任务是计算 A 组和 B 组标准化cent"变量之间的差异.
I am using dplyr and I am wondering whether it is possible to compute differences between groups in one line. As in the small example below, the task is to compute the difference between groups A and Bs standardized "cent" variables.
library(dplyr)
# creating a small data.frame
GROUP <- rep(c("A","B"),each=10)
NUMBE <- rnorm(20,50,10)
datf <- data.frame(GROUP,NUMBE)
datf2 <- datf %.% group_by(GROUP) %.% mutate(cent = (NUMBE - mean(NUMBE))/sd(NUMBE))
gA <- datf2 %.% ungroup() %.% filter(GROUP == "A") %.% select(cent)
gB <- datf2 %.% ungroup() %.% filter(GROUP == "B") %.% select(cent)
gA - gB
通过创建不同的对象当然没有问题 - 但是是否有更内置"的方式来执行此任务?更像下面这样不起作用的幻想代码?
This is of course no problem by creating different objects - but is there a more "built in" way of performing this task? Something more like this not working fantasy code below?
datf2 %.% summarize(filter(GROUP == "A",select(cent)) - filter(GROUP == "B",select(cent)))
谢谢!
推荐答案
假设我们每组有 10 个,添加索引 1:10、1:10 并总结不同之处:
Given we have 10 of each group, add an index 1:10, 1:10 and summarize over that with difference:
> datf2$entry=c(1:10,1:10)
> datf2 %.% ungroup() %.% group_by(entry) %.% summarize(d=cent[1]-cent[2])
Source: local data frame [10 x 2]
entry d
1 1 -0.8272879
2 2 -0.9159827
3 3 -0.5064762
4 4 0.4211639
5 5 1.3681720
6 6 3.3430289
7 7 1.0086822
8 8 -0.6163907
9 9 -0.7325220
10 10 -2.5423875
比较:
> gA - gB
cent
1 -0.8272879
2 -0.9159827
3 -0.5064762
4 0.4211639
5 1.3681720
6 3.3430289
7 1.0086822
8 -0.6163907
9 -0.7325220
10 -2.5423875
有没有办法将entry
字段注入到数据或dplyr
调用中?我不确定,它似乎依赖于对数据了解太多的函数......
Is there a way to inject the entry
field into the data or the dplyr
call? I'm not sure, it seems to rely on the functions knowing too much about the data...
这篇关于在 r 中使用 dplyr 建立组之间的差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!