如何将一组函数应用于R data.frame中的分组变量的每个组 [英] How to apply a set of functions to each group of a grouping variable in R data.frame

查看:226
本文介绍了如何将一组函数应用于R data.frame中的分组变量的每个组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在一步中重新整理R 中的data.frame。
简而言之,对象(x1到x6)的值的更改是逐行可见的(从1990到1995):

 > tab1 [1:10,]#原始数据见tab1的图块
id值年
1 x1 7 1990
2 x1 10 1991
3 x1 11 1992
4 x1 7 1993
5 x1 3 1994
6 x1 1 1995
7 x2 6 1990
8 x2 7 1991
9 x2 9 1992
10 x2 5 1993

我可以一步一步地重新整形,有没有人知道如何一步?



原始数据
表1 - 看到所有时间序列中的最小值为0 / p>

Step1:
表2 - 重新缩放每个每个的最小值将等于0 。
所有时间都在x轴上下降



Step2:
表3 - 在每个时间轴上应用 diff()函数。



Step3:
表4 - 对每个时间序列应用 sort()函数。



我希望图片清楚足够了解每一步。



所以最终的表格如下所示:

 > tab4 [1:10,] 
id值时间
1 x1 -4 1
2 x1 -4 2
3 x1 -2 3
4 x1 1 4
5 x1 3 5
6 x2 -4 1
7 x2 -3 2
8 x2 1 3
9 x2 1 4
10 x2 2 5

 #源数据:
tab1 < - data.frame(id = rep(c(x1,x2,x3,x4,x5,x6),每个= 6),
value = c(7,10,11,7, 3,1,6,7,9,5,2,3,11,9,7,9,1,
0,1,2,2,4,7,4,2,3,1, 6,4,2,3,5,4,3,5,6),
year = rep(c(1990:1995),times = 6))

tab2< - data.frame(id = rep(c(x1,x2,x3,x4,x5,x6),each = 6),
value = c ,9,10,6,2,0,4,5,7,3,0,1,11,9,7,9,1,0,
0,1,1,3,6,3 ,1,2,0,5,3,1,0,2,1,0,2,3),
year = rep(c(1990:1995),times = 6))

tab3< - data.frame(id = rep(c(x1,x2,x3,x4,x5,x6),each = 5),
value = c(3,1,-4,-4,-2,1,2,-4,-3,1,-2,-2,2,-8,-1,
1 ,0,2,3,-3,1,-2,5,-2,-2,2,-1,1,2,1),
time = rep(c(1:5) ,times = 6))

tab4< - data.frame(id = rep(c(x1,x2,x3,x4,x5 ),每个= 5),
值= c(-4,-4,-2,1,3,-4,-3,1,1,2,-8,-2,-2, -1,2,
-3,0,1,2,3,-2,-2,-2,1,5,-1,-1,1,2,2),
time = rep(c(1:5),times = 6))


解决方案

这听起来像是要为一组分组变量应用一组函数。在R(从基础R by 自由插入到附加包如 plyr data.table dplyr )。我一直在学习如何使用包 dplyr ,并提出了以下解决方案。

  require(dplyr)

tab4 = tab1%>%
group_by(id)%>%#group by id
mutate(value = value - min(value),value = value-lag(value))%>%#group min to 0,差值滞后1
na.omit%>%#删除由滞后1引起的差异差异$ b $每个id中的排列(id,value)%>%#按值的顺序
mutate(time = 1:length(value))%>%#根据当前顺序,将时间变量从1到5
select(-year)#remove year column to match final OP output


I need to reshape data.frame in R in one step. In short, change of values of objects (x1 to x6) is visible row by row (from 1990 to 1995):

> tab1[1:10, ] # raw data see plot for tab1
   id value year
1  x1     7 1990
2  x1    10 1991
3  x1    11 1992
4  x1     7 1993
5  x1     3 1994
6  x1     1 1995
7  x2     6 1990
8  x2     7 1991
9  x2     9 1992
10 x2     5 1993

I am able to do reshaping step by step, does anybody know how do it in one step?

Original data Table 1 - see that minimal value from all timeseries is "0"

Step1: Table 2 - rescale each timeseries that each would have minimal value equal "0". All times fall down on x-axes.

Step2: Table 3 - apply diff() function on each timeline.

Step3: Table 4 - apply sort() function on each timeseries.

I hope the pictures are clear enough for understanding each step.

So final table looks like this:

> tab4[1:10, ]
   id value time
1  x1    -4    1
2  x1    -4    2
3  x1    -2    3
4  x1     1    4
5  x1     3    5
6  x2    -4    1
7  x2    -3    2
8  x2     1    3
9  x2     1    4
10 x2     2    5

# Source data:
tab1 <- data.frame(id = rep(c("x1","x2","x3","x4","x5","x6"), each = 6),
                   value = c(7,10,11,7,3,1,6,7,9,5,2,3,11,9,7,9,1,
                             0,1,2,2,4,7,4,2,3,1,6,4,2,3,5,4,3,5,6),
                   year = rep(c(1990:1995), times = 6))

tab2 <- data.frame(id = rep(c("x1","x2","x3","x4","x5","x6"), each = 6),
                   value = c(6,9,10,6,2,0,4,5,7,3,0,1,11,9,7,9,1,0,
                             0,1,1,3,6,3,1,2,0,5,3,1,0,2,1,0,2,3),
                   year = rep(c(1990:1995), times = 6))

tab3 <- data.frame(id = rep(c("x1","x2","x3","x4","x5","x6"), each = 5),
                   value = c(3,1,-4,-4,-2,1,2,-4,-3,1,-2,-2,2,-8,-1,
                             1,0,2,3,-3,1,-2,5,-2,-2,2,-1,-1,2,1),
                   time = rep(c(1:5), times = 6))

tab4 <- data.frame(id = rep(c("x1","x2","x3","x4","x5","x6"), each = 5),
                   value = c(-4,-4,-2,1,3,-4,-3,1,1,2,-8,-2,-2,-1,2,
                             -3,0,1,2,3,-2,-2,-2,1,5,-1,-1,1,2,2),
                   time = rep(c(1:5), times = 6))

解决方案

It sounds like you want to apply a set of functions to each group of a grouping variable. There are many ways to do this in R (from base R by and tapply to add-on packages like plyr, data.table, and dplyr). I've been learning how to use package dplyr, and came up with the following solution.

require(dplyr)

tab4 = tab1 %>%
    group_by(id) %>% # group by id
    mutate(value = value - min(value), value = value - lag(value)) %>% # group min to 0, difference lag 1
    na.omit %>% # remove NA caused by lag 1 differencing
    arrange(id, value) %>% # order by value within each id
    mutate(time = 1:length(value)) %>% # Make a time variable from 1 to 5 based on current order
    select(-year) # remove year column to match final OP output

这篇关于如何将一组函数应用于R data.frame中的分组变量的每个组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆