使用dplyr计算变量变化的更简单方法? [英] Easier way to calculate change in variable using dplyr?

查看:67
本文介绍了使用dplyr计算变量变化的更简单方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试找到一种更简单的方法来计算使用 dplyr 的数据帧中的变量(以列表示)的变化。我的玩具数据集是这样的

I am trying to find an easier way to calculate change in a variable (represented by a column) in a data frame using dplyr. My toy data set is something like this

structure(list(CAR = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 1L, 
2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L), .Label = c("a", 
"b", "c", "d", "e", "f"), class = "factor"), TIME = c(0L, 0L, 
0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L
), VAR = c(20L, 30L, 40L, 50L, 60L, 70L, 30L, 40L, 50L, 60L, 
70L, 80L, 40L, 50L, 60L, 70L, 80L, 90L)), .Names = c("CAR", "TIME", 
"VAR"), class = "data.frame", row.names = c(NA, -18L))

看起来像

   CAR TIME VAR
1    a    0  20
2    b    0  30
3    c    0  40
4    d    0  50
5    e    0  60
6    f    0  70
7    a    1  30
8    b    1  40
9    c    1  50
10   d    1  60
11   e    1  70
12   f    1  80
13   a    2  40
14   b    2  50
15   c    2  60
16   d    2  70
17   e    2  80
18   f    2  90

我正在尝试计算 VAR 在 TIME 等于 0 与其他时间之间,例如 1,2 CAR c $ c>。

I am trying to calculate change in VAR between TIME equal to 0 and other times, e.g., 1,2 for each CAR.

这就是我要做的,首先,这似乎很复杂我在时间得到 VAR 的值等于 0

This is what I do, which seems a very convoluted way, first I get values of VAR at TIME equals to 0

library(dplyr)
X <- local_test %>% filter(TIME == 0)  %>% group_by(CAR)  %>% mutate(baseline_VAR = VAR)

X 看起来

Source: local data frame [6 x 4]
Groups: CAR

  CAR TIME VAR baseline_VAR
1   a    0  20           20
2   b    0  30           30
3   c    0  40           40
4   d    0  50           50
5   e    0  60           60
6   f    0  70           70

然后,我 left_join 与原始数据框 local_tes t

then, I do a left_join with the original data frame local_test

Y  <- left_join(local_test, X, by = c("CAR"))

Y 看起来像

   CAR TIME.x VAR.x TIME.y VAR.y baseline_VAR
1    a      0    20      0    20           20
2    b      0    30      0    30           30
3    c      0    40      0    40           40
4    d      0    50      0    50           50
5    e      0    60      0    60           60
6    f      0    70      0    70           70
7    a      1    30      0    20           20
8    b      1    40      0    30           30
9    c      1    50      0    40           40
10   d      1    60      0    50           50
11   e      1    70      0    60           60
12   f      1    80      0    70           70
13   a      2    40      0    20           20
14   b      2    50      0    30           30
15   c      2    60      0    40           40
16   d      2    70      0    50           50
17   e      2    80      0    60           60
18   f      2    90      0    70           70

最后,我在 Y 中添加一列,该列计算两个不同变量之间 VAR 的变化 CAR的时间

finally, I add a column in Y, which calculates the change in VAR between two different TIME for CAR

Y %>% group_by(CAR) %>% mutate(change_VAR = VAR.x - baseline_VAR)

最终 Y 看起来

Source: local data frame [18 x 7]
Groups: CAR

   CAR TIME.x VAR.x TIME.y VAR.y baseline_VAR change_VAR
1    a      0    20      0    20           20          0
2    b      0    30      0    30           30          0
3    c      0    40      0    40           40          0
4    d      0    50      0    50           50          0
5    e      0    60      0    60           60          0
6    f      0    70      0    70           70          0
7    a      1    30      0    20           20         10
8    b      1    40      0    30           30         10
9    c      1    50      0    40           40         10
10   d      1    60      0    50           50         10
11   e      1    70      0    60           60         10
12   f      1    80      0    70           70         10
13   a      2    40      0    20           20         20
14   b      2    50      0    30           30         20
15   c      2    60      0    40           40         20
16   d      2    70      0    50           50         20
17   e      2    80      0    60           60         20
18   f      2    90      0    70           70         20

这似乎是很多额外的工作,在原始数据框中添加了额外的列。我需要对大数据帧重复执行此操作。有没有一种更简单(单步)的方法来计算 change_VAR

This seems like a lot of extra work, with extra columns being added to the original data frame. I need to do this operation for a large data frame repeatedly. Is there an easier (one-step) way to compute change_VAR?

谢谢!

推荐答案

这可以通过将'VAR'与 min 取差来实现。

This could be done by taking the difference of 'VAR' with the min of 'VAR' grouped by 'CAR'.

local_test %>%
     group_by(CAR) %>%
     mutate(change_VAR= VAR- min(VAR))

或者如果'VAR的基值'是当'TIME'为0(假设每个组中没有'TIME'的重复项)时,我们将TIME 0的'VAR'子集化并得到差值。

Or if the base value of 'VAR' is when 'TIME' is 0 (assuming there are no duplicates of 'TIME' per each group), we subset the 'VAR' for TIME 0 and get the difference.

local_test %>% 
      group_by(CAR) %>%
      mutate(change_VAR= VAR- VAR[TIME==0])

这篇关于使用dplyr计算变量变化的更简单方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆