R:按组和添加的差异 [英] R: Differences by group and adding
问题描述
想象一下,我有一个这样的数据框架:
$ $ $ $ $ $ $ $ $ $ b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b $ b TT< - rep(1:4,3)
ZZ < - ave(XX * TT,ID,FUN = cumsum)
DF< - data.frame(ID,XX, ZZ)
ID TT XX ZZ
1 1 0.266 0.266
1 2 0.372 1.010
1 3 0.573 2.729
1 4 0.908 6.361
2 1 0.202 0.202
2 2 0.898 1.998
2 3 0.945 4.833
2 4 0.661 7.477
3 1 0.629 0.629
3 2 0.062 0.753
3 3 0.206 1.371
3 4 0.177 2.079
我希望得到每列,由ID的组中的增量(两个连续元素之间的差异)。保持第一个(好像有一个零)。
ID TT XX ZZ
1 1 0.266 0.266
1 2 0.106 0.744
1 3 0.201 1.719
1 4 0.335 3.632
2 1 0.202 0.202
2 2 0.696 1.796
2 3 0.047 2.835
2 4 -0.284 2.644
3 1 0.629 0.629
3 2 -0.567 0.124
3 3 0.144 0.618
3 4 -0.029 0.708
我尝试过
ave (DF [3:4],DF $ ID,FUN = function(x)diff(c(0,x)))
但它不起作用,它会产生错误:
r [i1]中的错误 - r [-length(r):-( length(r) - lag + 1L)]:
二进制运算符的非数值参数
有没有简单的方法?
我发现我可以得到正确的输出: / p>
ave(DF [3:4],DF $ ID,FUN = function(x)
sapply(x,FUN = function(y)diff(c(0,y))))
但它得到相当漫长而复杂的如此简单的操作。
我发现我也可以通过使用data.table来做到这一点,但是我更愿意使用base R来实现。
setDT(DF)
DF [,lapply(.SD,FUN = function(x)diff(c(0,x))),keyby = ID]
我也不知道如何插入新行(大量零)每个组的开始或给定的一些条件。
ID XX ZZ
1 0 0
1 0.266 0.266
1 0.372 1.010
1 0.573 2.729
1 0.908 6.361
2 0 0
2 0.202 0.202
2 0.898 1.998
2 0.945 4.833
2 0.661 7.477
3 0 0
3 0.629 0.629
3 0.062 0.753
3 0.206 1.371
3 0.177 2.079
我尝试过:
ave(DF [3:4] DF $ ID,FUN = function(x)sapply(x,FUN = function(y)(c(0,y))))
警告:
数据长度[10]不是子
行的数量[4]
我想这样做的一般方法是工作具有行的索引。
PD:我已经更新了这篇文章。
试图做更简单我已经删除了TT列,但我已经注意到这一点很重要。
我的解决方案假设表是由TT排序的,但有时候不是这样的。
我真正想要的是:
XX1
XX2-XX1
XX3-XX2
XX4-XX3
我们得到的子索引不是从表上的位置,而是从T 。
我不知道是否更有效,首先通过TT排序列或创建一个paste()语法。
我想您将需要在相关列中使用 lapply()
,如 ave()
将不参加其第一个参数列表。尝试这样:
df [-1]< - lapply(
pre>
df [-1],
函数(x)ave(x,df $ ID,FUN = function(x)c(x [1],diff(x)))
)
其中给出了更新的
df
ID XX ZZ
1 1 0.266 0.266
2 1 0.106 0.744
3 1 0.201 1.719
4 1 0.335 3.632
5 2 0.202 0.202
6 2 0.696 1.796
7 2 0.047 2.835
8 2 -0.284 2.644
9 3 0.629 0.629
10 3 -0.567 0.124
11 3 0.144 0.618
12 3 -0.029 0.708
数据:
df< - structure(list(ID = c(1L ,1L,1L,1L,2L,2L,2L,2L,3L,3L,
3L,3L),XX = c(0.266,0.372,0.573,0.908,0.202,0.898,0.945,
0.661,0.629,0.062,0.606,0.177),ZZ = c(0.266,1.01,2.729,
6.361,0.22,1.998,4.833,7.477,0.629,0.75,1.37,1,279
))。名= C( ID, XX, ZZ), class =data.frame,row.names = c(NA,
-12L))
I would like to know how to do this operation simpler.
Imagine I have a data.frame like this one:set.seed(1) ID <- rep(1:3,each=4) XX <- round(runif(12),3) TT <- rep(1:4, 3) ZZ <- ave(XX*TT,ID, FUN = cumsum) DF <- data.frame(ID, XX, ZZ) ID TT XX ZZ 1 1 0.266 0.266 1 2 0.372 1.010 1 3 0.573 2.729 1 4 0.908 6.361 2 1 0.202 0.202 2 2 0.898 1.998 2 3 0.945 4.833 2 4 0.661 7.477 3 1 0.629 0.629 3 2 0.062 0.753 3 3 0.206 1.371 3 4 0.177 2.079
I' would like to get, for each column, the increments (differences between two consecutive elements) by groups of ID. Keeping the first one (as if there is a previous zero).
ID TT XX ZZ 1 1 0.266 0.266 1 2 0.106 0.744 1 3 0.201 1.719 1 4 0.335 3.632 2 1 0.202 0.202 2 2 0.696 1.796 2 3 0.047 2.835 2 4 -0.284 2.644 3 1 0.629 0.629 3 2 -0.567 0.124 3 3 0.144 0.618 3 4 -0.029 0.708
I've tried with
ave(DF[3:4],DF$ID,FUN=function(x) diff(c(0,x)))
but it doesn't work, it produces the error:
Error in r[i1] - r[-length(r):-(length(r) - lag + 1L)] : non-numeric argument to binary operator
Isn't there an easy way to do it?
I've found that I can get the proper output with:ave(DF[3:4],DF$ID,FUN=function(x) sapply(x, FUN=function(y) diff(c(0,y))))
but it gets quite long and complex for a so simple operation. I've found that I can also do it by using data.table but I prefer to be able to do it with base R.
setDT(DF) DF[, lapply(.SD, FUN=function(x) diff(c(0,x)) ), keyby = ID ]
I also don't know how to insert a new row (plenty of zeroes) at the beginning of each group or given some condition.
ID XX ZZ 1 0 0 1 0.266 0.266 1 0.372 1.010 1 0.573 2.729 1 0.908 6.361 2 0 0 2 0.202 0.202 2 0.898 1.998 2 0.945 4.833 2 0.661 7.477 3 0 0 3 0.629 0.629 3 0.062 0.753 3 0.206 1.371 3 0.177 2.079
I've tried with:
ave(DF[3:4],DF$ID,FUN=function(x) sapply(x, FUN=function(y) (c(0,y))))
warning:
data length [10] is not a sub-multiple or multiple of the number of rows [4]
I guess the general way to do it would be working with indexes of the rows.
PD: I've updated the post.
Trying to do it simpler I had removed the TT column but I have leater noticed that is important.
My solution assumes that the table is ordered by TT, but sometimes it's not like that. What I really want is:
XX1 XX2-XX1 XX3-XX2 XX4-XX3
Where we get the subindexes not from the position on the table but from T. I don't know whether is more effcicient to do it by first sorting the columns by TT or by creating a paste() syntax.
解决方案I think you will need to use
lapply()
across the relevant columns, asave()
will not take a list in its first argument. Try this:df[-1] <- lapply( df[-1], function(x) ave(x, df$ID, FUN = function(x) c(x[1], diff(x))) )
which gives the updated
df
ID XX ZZ 1 1 0.266 0.266 2 1 0.106 0.744 3 1 0.201 1.719 4 1 0.335 3.632 5 2 0.202 0.202 6 2 0.696 1.796 7 2 0.047 2.835 8 2 -0.284 2.644 9 3 0.629 0.629 10 3 -0.567 0.124 11 3 0.144 0.618 12 3 -0.029 0.708
Data:
df <- structure(list(ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L), XX = c(0.266, 0.372, 0.573, 0.908, 0.202, 0.898, 0.945, 0.661, 0.629, 0.062, 0.206, 0.177), ZZ = c(0.266, 1.01, 2.729, 6.361, 0.202, 1.998, 4.833, 7.477, 0.629, 0.753, 1.371, 2.079 )), .Names = c("ID", "XX", "ZZ"), class = "data.frame", row.names = c(NA, -12L))
这篇关于R:按组和添加的差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!