R ddply具有多个变量 [英] R ddply with multiple variables

查看:83
本文介绍了R ddply具有多个变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我的真实数据集的一个简单数据框:

Here is a simple data frame for my real data set:

df <- data.frame(ID=rep(101:102,each=9),phase=rep(1:3,6),variable=rep(LETTERS[1:3],each=3,times=2),mm1=c(1:18),mm2=c(19:36),mm3=c(37:54))

我想首先按ID和变量分组,然后对于值(mm1,mm2,mm3),从所有阶段(阶段1到阶段3)中减去阶段3,这将使阶段1中的mm(1-3)全部-2,在阶段2均为-1,在阶段3均为0.

I would like to first group by ID and variable, then for values(mm1, mm2, mm3), phase 3 is subtracted from all phases(phase1 to phase3), which would make mm(1-3) in phase 1 all -2, in phase 2 all -1, and phase 3 all 0.

R引发错误"Ops.data.frame(x,x [3,])中的错误:-仅为大小相等的数据帧定义" 正如我尝试过的:

R throws an error of "Error in Ops.data.frame(x, x[3, ]) : - only defined for equally-sized data frames" as I tried:

df1 <- ddply(df, .(ID, variable), function(x) (x - x[3,]))   

任何建议将不胜感激. 输出应如下所示:

Any advice would be greatly appreciated. The output should be look like this:

ID phase variable mm1 mm2 mm3
101  1      A     -2  -2  -2
101  2      A     -1  -1  -1
101  3      A      0   0   0
101  1      B     -2  -2  -2
101  2      B     -1  -1  -1
101  3      B      0   0   0
101  1      C     -2  -2  -2
101  2      C     -1  -1  -1
101  3      C      0   0   0
102  1      A     -2  -2  -2
102  2      A     -1  -1  -1
102  3      A      0   0   0
102  1      B     -2  -2  -2
102  2      B     -1  -1  -1
102  3      B      0   0   0
102  1      C     -2  -2  -2
102  2      C     -1  -1  -1
102  3      C      0   0   0

推荐答案

好吧,我花了点时间弄清楚您想要什么,但这是一个解决方案:

Okay, took me a little bit to figure out what you want, but here is a solution:

cols.to.sub <- paste0("mm", 1:3)
df1 <- ddply(
  df, .(ID, variable), 
  function(x) {
    x[cols.to.sub] <- t(t(as.matrix(x[cols.to.sub])) - unlist(x[x$phase == 3, cols.to.sub]))
    x
} ) 

这将产生(前6行):

    ID phase variable mm1 mm2 mm3
1  101     1        A  -2  -2  -2
2  101     2        A  -1  -1  -1
3  101     3        A   0   0   0
4  101     1        B  -2  -2  -2
5  101     2        B  -1  -1  -1
6  101     3        B   0   0   0

通常来说,调试此类问题的最佳方法是在要传递给ddply的函数中放入browser()语句,以便您可以随意检查对象.这样做将表明:

Generally speaking the best way to debug this type of issue is to put a browser() statement inside the function you are passing to ddply, so you can examine the objects at your leisure. Doing so would have revealed that:

  1. 传递给函数的数据框包括ID列和phase列,因此您的mm列不是前三列(因此需要定义cols.to.sub)
  2. 即使您解决了这个问题,也无法对尺寸不相等的数据帧进行操作,因此我在这里所做的就是转换为矩阵,然后利用向量循环从矩阵的其余部分减去一行.我需要t(转置),因为矢量循环是按列进行的.
  1. The data frame passed to your function includes the ID columns, as well as the phase columns, so your mm columns are not the first three (hence the need to define cols.to.sub)
  2. Even if you address that, you can't operate on data frames that have unequal dimensions, so what I do here is convert to matrix, and then take advantage of vector recycling to subtract the one row from the rest of the matrix. I need to t (transpose) because vector recycling is column-wise.

这篇关于R ddply具有多个变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆