函数"diff"被称为"diff".在R中的各个组上 [英] Function "diff" over various groups in R
问题描述
name <- c("a", "a", "a", "a", "a", "a","a", "a", "a", "b", "b", "b","b", "b", "b","b", "b", "b")
class <- c("c1", "c1", "c1", "c2", "c2", "c2", "c3", "c3", "c3","c1", "c1", "c1", "c2", "c2", "c2", "c3", "c3", "c3")
year <- c("2010", "2009", "2008", "2010", "2009", "2008", "2010", "2009", "2008", "2010", "2009", "2008", "2010", "2009", "2008", "2010", "2009", "2008")
value <- c(100, 33, 80, 90, 80, 100, 100, 90, 80, 90, 80, 100, 100, 90, 80, 99, 80, 100)
df <- data.frame(name, class, year, value)
df
,并希望对类别"和名称"的每个组合应用"diff"功能.
我想要的输出应如下所示:
name class year value.1
1 a c1 2010 -67
2 a c1 2009 47
3 b c1 2010 -10
4 b c1 2009 20
...
我尝试过
aggregate(value~name + class, data=df, FUN="diff")
不能提供我在大型数据集中寻找的解决方案.提前非常感谢您!
塞巴特语
plyr
软件包将成为您的朋友.函数ddply
取一个data.frame
,对每个定义的子集应用一个函数,然后返回所有重组片段的data.frame
.
最简单的解决方案是对.(class, name)
的每种组合使用summarize
和diff(value)
:
library(plyr)
ddply(df, .(class, name), summarize, diff(value))
class name ..1
1 c1 a -67
2 c1 a 47
3 c1 b -10
4 c1 b 20
5 c2 a -10
6 c2 a 20
7 c2 b -10
8 c2 b -10
9 c3 a -10
10 c3 a -10
11 c3 b -19
12 c3 b 20
要获得成果,要付出更多的努力:
ddply(df, .(class, name), summarize, year=head(year, -1), value=diff(value))
class name year value
1 c1 a 2010 -67
2 c1 a 2009 47
3 c1 b 2010 -10
4 c1 b 2009 20
5 c2 a 2010 -10
6 c2 a 2009 20
7 c2 b 2010 -10
8 c2 b 2009 -10
9 c3 a 2010 -10
10 c3 a 2009 -10
11 c3 b 2010 -19
12 c3 b 2009 20
i have a data frame with 2 groups 1 timevariable and an dependent variable. e.g.:
name <- c("a", "a", "a", "a", "a", "a","a", "a", "a", "b", "b", "b","b", "b", "b","b", "b", "b")
class <- c("c1", "c1", "c1", "c2", "c2", "c2", "c3", "c3", "c3","c1", "c1", "c1", "c2", "c2", "c2", "c3", "c3", "c3")
year <- c("2010", "2009", "2008", "2010", "2009", "2008", "2010", "2009", "2008", "2010", "2009", "2008", "2010", "2009", "2008", "2010", "2009", "2008")
value <- c(100, 33, 80, 90, 80, 100, 100, 90, 80, 90, 80, 100, 100, 90, 80, 99, 80, 100)
df <- data.frame(name, class, year, value)
df
and would like to apply the "diff" function along each combination off "class" and "name".
My desired output should look something like this:
name class year value.1
1 a c1 2010 -67
2 a c1 2009 47
3 b c1 2010 -10
4 b c1 2009 20
...
I tried
aggregate(value~name + class, data=df, FUN="diff")
which does not yield the solution i'm looking for in a large dataset. Thank you very much in advance!
Sebatian
The plyr
package is going to be your friend. The function ddply
takes a data.frame
, applies a function for each defined subset, then returns a data.frame
of all the recombined pieces.
The simplest solution is to use summarize
and diff(value)
for each combination of .(class, name)
:
library(plyr)
ddply(df, .(class, name), summarize, diff(value))
class name ..1
1 c1 a -67
2 c1 a 47
3 c1 b -10
4 c1 b 20
5 c2 a -10
6 c2 a 20
7 c2 b -10
8 c2 b -10
9 c3 a -10
10 c3 a -10
11 c3 b -19
12 c3 b 20
To get your years in the results, it's a little bit more involved:
ddply(df, .(class, name), summarize, year=head(year, -1), value=diff(value))
class name year value
1 c1 a 2010 -67
2 c1 a 2009 47
3 c1 b 2010 -10
4 c1 b 2009 20
5 c2 a 2010 -10
6 c2 a 2009 20
7 c2 b 2010 -10
8 c2 b 2009 -10
9 c3 a 2010 -10
10 c3 a 2009 -10
11 c3 b 2010 -19
12 c3 b 2009 20
这篇关于函数"diff"被称为"diff".在R中的各个组上的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!