按组计算相对于基线的时间序列的相对变化。如果未测量基线值,则为NA [英] Calculate relative changes in a time series with respect to a baseline by group. NA if no baseline value was measured

查看:121
本文介绍了按组计算相对于基线的时间序列的相对变化。如果未测量基线值,则为NA的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用dplyr按组计算data.frame中测量变量的相对变化。
更改是针对时间== 0时的第一个基准值。

I'd like to calculate relative changes of measured variables in a data.frame by group with dplyr. The changes are with respect to a first baseline value at time==0.

在以下示例中,我可以轻松地做到这一点:

I can easily do this in the following example:

 # with this easy example it works 
 df.easy <- data.frame( id  =c(1,1,1,2,2,2)
                   ,time=c(0,1,2,0,1,2)
                   ,meas=c(5,6,9,4,5,6))

 df.easy %>% dplyr::group_by(id) %>% dplyr::mutate(meas.relative =
 meas/meas[time==0])
     # Source: local data frame [6 x 4]
     # Groups: id [2]
     # 
     #      id  time  meas meas.relative
     #   <dbl> <dbl> <dbl>         <dbl>
     # 1     1     0     5          1.00
     # 2     1     1     6          1.20
     # 3     1     2     9          1.80
     # 4     2     0     4          1.00
     # 5     2     1     5          1.25
     # 6     2     2     6          1.50

但是,当有id且没有度量时== 0,这不起作用。
类似的问题是,但我想得到一个NA,而不是简单地将首次出现的情况作为基线。

However, when there are id's with no measuremnt at time==0, this doesn't work. A similar question is this, but I'd like to get an NA as a result instead of simply taking the first occurence as baseline.

 # how to output NA in case there are id's with no measurement at time==0?
 df <- data.frame( id  =c(1,1,1,2,2,2,3,3)
                  ,time=c(0,1,2,0,1,2,1,2)
                  ,meas=c(5,6,9,4,5,6,5,6))

 # same approach now gives an error:
     df %>% dplyr::group_by(id) %>% dplyr::mutate(meas.relative = meas/meas[time==0])
     # Error in mutate_impl(.data, dots) : 
     #   incompatible size (0), expecting 2 (the group size) or 1

如果使用 ifelse

 df %>% dplyr::group_by(id) %>% dplyr::mutate(meas.relative = ifelse(any(time==0), meas/meas[time==0], NA) )
     # Source: local data frame [8 x 4]
     # Groups: id [3]
     # 
     #      id  time  meas meas.relative
     #   <dbl> <dbl> <dbl>         <dbl>
     # 1     1     0     5             1
     # 2     1     1     6             1
     # 3     1     2     9             1
     # 4     2     0     4             1
     # 5     2     1     5             1
     # 6     2     2     6             1
     # 7     3     1     5            NA
     # 8     3     2     6            NA>

等等,为什么超出相对测量值1?

Wait, why is above the relative measurement 1?

identical(
    df %>% dplyr::group_by(id) %>% dplyr::mutate(meas.relative = ifelse(any(time==0), meas, NA) ),
    df %>% dplyr::group_by(id) %>% dplyr::mutate(meas.relative = ifelse(any(time==0), meas[time==0], NA) )
    )
    # TRUE

似乎ifelse阻止度量选择当前行,但总是选择time == 0的子集。

It seems that the ifelse prevents meas to pick the current line, but selects always the subset where time==0.

当ID为时,如何计算相对变化

How can I calculate relative changes when there are IDs with no baseline measurement?

推荐答案

您的问题出在 ifelse() 。根据 ifelse 文档,它返回与... test 长度相同的向量。由于每个组的 any(time == 0)的长度为1( TRUE FALSE )仅选择了 meas / meas [time == 0] 的第一个观测值。然后重复此操作以填充每个组。

Your issue was in the ifelse(). According to the ifelse documentation it returns "A vector of the same length...as test". Since any(time==0) is of length 1 for each group (TRUE or FALSE) only the first observation of the meas / meas[time==0] was being selected. This was then repeated to fill each group.

要解决此问题,我要做的是 rep any()为组的长度。我认为这应该可行:

To fix this all I did was rep the any() to be the length of the group. I believe this should work:

df %>% dplyr::group_by(id) %>% 
       dplyr::mutate(meas.relative = ifelse(rep(any(time==0),times = n()), meas/meas[time==0], NA) )

  #       id  time  meas meas.relative
  #    <dbl> <dbl> <dbl>         <dbl>
  #  1     1     0     5          1.00
  #  2     1     1     6          1.20
  #  3     1     2     9          1.80
  #  4     2     0     4          1.00
  #  5     2     1     5          1.25
  #  6     2     2     6          1.50
  #  7     3     1     5            NA
  #  8     3     2     6            NA

要查看此情况在您的情况下如何正常工作,请尝试:

To see how this was working incorrectly in your case try:

ifelse(TRUE,c(1,2,3),NA)
#[1] 1

编辑:具有相同概念的 data.table 解决方案:

A data.table solution with the same concept:

as.data.table(df)[, meas.rel := ifelse(rep(any(time==0), .N), meas/meas[time==0], NA_real_)
                  ,by=id]

这篇关于按组计算相对于基线的时间序列的相对变化。如果未测量基线值,则为NA的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆