按组计算相对于基线的时间序列的相对变化。如果未测量基线值,则为NA [英] Calculate relative changes in a time series with respect to a baseline by group. NA if no baseline value was measured
问题描述
我想使用dplyr按组计算data.frame中测量变量的相对变化。
更改是针对时间== 0时的第一个基准值。
I'd like to calculate relative changes of measured variables in a data.frame by group with dplyr. The changes are with respect to a first baseline value at time==0.
在以下示例中,我可以轻松地做到这一点:
I can easily do this in the following example:
# with this easy example it works
df.easy <- data.frame( id =c(1,1,1,2,2,2)
,time=c(0,1,2,0,1,2)
,meas=c(5,6,9,4,5,6))
df.easy %>% dplyr::group_by(id) %>% dplyr::mutate(meas.relative =
meas/meas[time==0])
# Source: local data frame [6 x 4]
# Groups: id [2]
#
# id time meas meas.relative
# <dbl> <dbl> <dbl> <dbl>
# 1 1 0 5 1.00
# 2 1 1 6 1.20
# 3 1 2 9 1.80
# 4 2 0 4 1.00
# 5 2 1 5 1.25
# 6 2 2 6 1.50
但是,当有id且没有度量时== 0,这不起作用。
类似的问题是此,但我想得到一个NA,而不是简单地将首次出现的情况作为基线。
However, when there are id's with no measuremnt at time==0, this doesn't work. A similar question is this, but I'd like to get an NA as a result instead of simply taking the first occurence as baseline.
# how to output NA in case there are id's with no measurement at time==0?
df <- data.frame( id =c(1,1,1,2,2,2,3,3)
,time=c(0,1,2,0,1,2,1,2)
,meas=c(5,6,9,4,5,6,5,6))
# same approach now gives an error:
df %>% dplyr::group_by(id) %>% dplyr::mutate(meas.relative = meas/meas[time==0])
# Error in mutate_impl(.data, dots) :
# incompatible size (0), expecting 2 (the group size) or 1
如果使用 ifelse
df %>% dplyr::group_by(id) %>% dplyr::mutate(meas.relative = ifelse(any(time==0), meas/meas[time==0], NA) )
# Source: local data frame [8 x 4]
# Groups: id [3]
#
# id time meas meas.relative
# <dbl> <dbl> <dbl> <dbl>
# 1 1 0 5 1
# 2 1 1 6 1
# 3 1 2 9 1
# 4 2 0 4 1
# 5 2 1 5 1
# 6 2 2 6 1
# 7 3 1 5 NA
# 8 3 2 6 NA>
等等,为什么超出相对测量值1?
Wait, why is above the relative measurement 1?
identical(
df %>% dplyr::group_by(id) %>% dplyr::mutate(meas.relative = ifelse(any(time==0), meas, NA) ),
df %>% dplyr::group_by(id) %>% dplyr::mutate(meas.relative = ifelse(any(time==0), meas[time==0], NA) )
)
# TRUE
似乎ifelse阻止度量选择当前行,但总是选择time == 0的子集。
It seems that the ifelse prevents meas to pick the current line, but selects always the subset where time==0.
当ID为时,如何计算相对变化
How can I calculate relative changes when there are IDs with no baseline measurement?
推荐答案
您的问题出在 ifelse()
。根据 ifelse
文档,它返回与... test 长度相同的向量。由于每个组的 any(time == 0)
的长度为1( TRUE
或 FALSE
)仅选择了 meas / meas [time == 0]
的第一个观测值。然后重复此操作以填充每个组。
Your issue was in the ifelse()
. According to the ifelse
documentation it returns "A vector of the same length...as test". Since any(time==0)
is of length 1 for each group (TRUE
or FALSE
) only the first observation of the meas / meas[time==0]
was being selected. This was then repeated to fill each group.
要解决此问题,我要做的是 rep
any()
为组的长度。我认为这应该可行:
To fix this all I did was rep
the any()
to be the length of the group. I believe this should work:
df %>% dplyr::group_by(id) %>%
dplyr::mutate(meas.relative = ifelse(rep(any(time==0),times = n()), meas/meas[time==0], NA) )
# id time meas meas.relative
# <dbl> <dbl> <dbl> <dbl>
# 1 1 0 5 1.00
# 2 1 1 6 1.20
# 3 1 2 9 1.80
# 4 2 0 4 1.00
# 5 2 1 5 1.25
# 6 2 2 6 1.50
# 7 3 1 5 NA
# 8 3 2 6 NA
要查看此情况在您的情况下如何正常工作,请尝试:
To see how this was working incorrectly in your case try:
ifelse(TRUE,c(1,2,3),NA)
#[1] 1
编辑:具有相同概念的 data.table
解决方案:
A data.table
solution with the same concept:
as.data.table(df)[, meas.rel := ifelse(rep(any(time==0), .N), meas/meas[time==0], NA_real_)
,by=id]
这篇关于按组计算相对于基线的时间序列的相对变化。如果未测量基线值,则为NA的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!