函数"diff"被称为"diff".在R中的各个组上 [英] Function "diff" over various groups in R

查看:135
本文介绍了函数"diff"被称为"diff".在R中的各个组上的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据帧,该数据帧具有2组1个时间变量和一个因变量.例如:

name <- c("a", "a", "a", "a", "a", "a","a", "a", "a", "b", "b", "b","b", "b", "b","b", "b", "b")
class <- c("c1", "c1", "c1", "c2", "c2", "c2", "c3", "c3", "c3","c1", "c1", "c1", "c2", "c2", "c2", "c3", "c3", "c3")
year <- c("2010", "2009", "2008", "2010", "2009", "2008", "2010", "2009", "2008", "2010", "2009", "2008", "2010", "2009", "2008", "2010", "2009", "2008")
value <- c(100, 33, 80, 90, 80, 100, 100, 90, 80, 90, 80, 100, 100, 90, 80, 99, 80, 100)

df <- data.frame(name, class, year, value)
df

,并希望对类别"和名称"的每个组合应用"diff"功能.

我想要的输出应如下所示:

      name class year value.1
    1    a    c1   2010  -67      
    2    a    c1   2009   47
    3    b    c1   2010  -10
    4    b    c1   2009   20
    ...

我尝试过

aggregate(value~name + class, data=df, FUN="diff")

不能提供我在大型数据集中寻找的解决方案.提前非常感谢您!

塞巴特语

解决方案

plyr软件包将成为您的朋友.函数ddply取一个data.frame,对每个定义的子集应用一个函数,然后返回所有重组片段的data.frame.

最简单的解决方案是对.(class, name)的每种组合使用summarizediff(value):

library(plyr)
ddply(df, .(class, name), summarize, diff(value))

   class name ..1
1     c1    a -67
2     c1    a  47
3     c1    b -10
4     c1    b  20
5     c2    a -10
6     c2    a  20
7     c2    b -10
8     c2    b -10
9     c3    a -10
10    c3    a -10
11    c3    b -19
12    c3    b  20

要获得成果,要付出更多的努力:

ddply(df, .(class, name), summarize, year=head(year, -1), value=diff(value))
   class name year value
1     c1    a 2010   -67
2     c1    a 2009    47
3     c1    b 2010   -10
4     c1    b 2009    20
5     c2    a 2010   -10
6     c2    a 2009    20
7     c2    b 2010   -10
8     c2    b 2009   -10
9     c3    a 2010   -10
10    c3    a 2009   -10
11    c3    b 2010   -19
12    c3    b 2009    20

i have a data frame with 2 groups 1 timevariable and an dependent variable. e.g.:

name <- c("a", "a", "a", "a", "a", "a","a", "a", "a", "b", "b", "b","b", "b", "b","b", "b", "b")
class <- c("c1", "c1", "c1", "c2", "c2", "c2", "c3", "c3", "c3","c1", "c1", "c1", "c2", "c2", "c2", "c3", "c3", "c3")
year <- c("2010", "2009", "2008", "2010", "2009", "2008", "2010", "2009", "2008", "2010", "2009", "2008", "2010", "2009", "2008", "2010", "2009", "2008")
value <- c(100, 33, 80, 90, 80, 100, 100, 90, 80, 90, 80, 100, 100, 90, 80, 99, 80, 100)

df <- data.frame(name, class, year, value)
df

and would like to apply the "diff" function along each combination off "class" and "name".

My desired output should look something like this:

      name class year value.1
    1    a    c1   2010  -67      
    2    a    c1   2009   47
    3    b    c1   2010  -10
    4    b    c1   2009   20
    ...

I tried

aggregate(value~name + class, data=df, FUN="diff")

which does not yield the solution i'm looking for in a large dataset. Thank you very much in advance!

Sebatian

解决方案

The plyr package is going to be your friend. The function ddply takes a data.frame, applies a function for each defined subset, then returns a data.frame of all the recombined pieces.

The simplest solution is to use summarize and diff(value) for each combination of .(class, name):

library(plyr)
ddply(df, .(class, name), summarize, diff(value))

   class name ..1
1     c1    a -67
2     c1    a  47
3     c1    b -10
4     c1    b  20
5     c2    a -10
6     c2    a  20
7     c2    b -10
8     c2    b -10
9     c3    a -10
10    c3    a -10
11    c3    b -19
12    c3    b  20

To get your years in the results, it's a little bit more involved:

ddply(df, .(class, name), summarize, year=head(year, -1), value=diff(value))
   class name year value
1     c1    a 2010   -67
2     c1    a 2009    47
3     c1    b 2010   -10
4     c1    b 2009    20
5     c2    a 2010   -10
6     c2    a 2009    20
7     c2    b 2010   -10
8     c2    b 2009   -10
9     c3    a 2010   -10
10    c3    a 2009   -10
11    c3    b 2010   -19
12    c3    b 2009    20

这篇关于函数"diff"被称为"diff".在R中的各个组上的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆