按组和时间匹配计算值之间的差异 [英] Calculate difference between values by group and matched for time
问题描述
对于每只鸟,我想计算不同日期的平均每小时体温(Tb)测量值之间的差异(Tb_Periods)。我的目标是能够比较BirdX的Tb从0900 PreI到09:00 DayI,10:00 PreI到10:00 PostI等的变化。Tb_Period表示操作(PreI),操作日之前的时间(DayI)和后期操作(PostI)。我最初的df:
For each individual bird, I would like to calculate the difference between average hourly body temperature (Tb) measurements taken on different days (Tb_Periods). My goal is to be able to compare the change in Tb of BirdX from 0900 PreI to 09:00 DayI, 10:00 PreI to 10:00 PostI etc. The Tb_Period represents the time before manipulation(PreI), day-of-manipulation(DayI), and post-manipulation(PostI). My initial df:
Date_Time Bird_ID Tb Hour Treatment Tb_Period
2018-04-04 11:01:39 3282 42.2 11 Control PreI
2018-04-04 12:31:51 3282 41.2 12 Control PreI
....
2018-04-05 09:16:54 3282 41.9 9 Control DayI
....
2018-04-06 08:09:57 3282 41.4 8 Control PostI
到目前为止,我所做的是:每只鸟在48小时的时间内每10分钟进行一次体温测量,因此我首先使用dplyr计算每只鸟每小时的平均Tb:
What I have done so far: Each bird has body temperature measurements taken every 10 minutes over a timespan of 48hrs, so I first calculated the average Tb of each bird for each hour using dplyr:
Tb_Averages <- TbData %>% group_by(Tb_Period, Hour, Bird_ID, Treatment)%>%
summarize(meanHourTb = mean(Tb))
结果df:
Tb_Period Hour Bird_ID Treatment meanHourTb
PreI 9 3500 LPS 41.55000
PreI 10 3500 LPS 41.75000
...
DayI 9 3500 LPS 40.88182
DayI 10 3500 LPS 41.24000
现在我想要的是一个看起来像这样的df:
Now what I would like is a df that looks like this:
Bird_ID Hour Treatment Tb_Diff
3500 9 LPS -.67 (40.88-41.55)
3282 9 LPS .5 (e.g.)
基于按组计算连续行中的值之间的差异,我尝试了以下变化(使用dplyrs排列功能):
Based on an answer from Calculate difference between values in consecutive rows by group, I have tried variations (with dplyrs arrange function) of:
Tb_Averages <- Tb_Averages %>%
group_by(Tb_Period, Bird_ID, Hour) %>%
mutate(Tb_Diff = c(NA, diff(meanHourTb))))
,但继续获取Tb_Diff列的NA。解决此问题的最佳方法是什么(最好使用dplyr)?
but keeping getting NAs for the Tb_Diff column. What is the best approach to solve this problem (ideally using dplyr)?
推荐答案
您快到了!关键是将Tb_Period转换为有序因子,从而将 PreI
视为小于 DayI
反过来小于 PostI
。一旦确定了这一点,我们就可以将每只鸟和每一小时进行分组,并按照Tb_Period进行排序,以确保以正确的顺序计算差异:
You're nearly there! The key is to convert Tb_Period to an ordered factor, such that PreI
is treated as "less than" DayI
, which is in turn less than PostI
. Once this is established, we can group by each bird and hour, and sort by Tb_Period to ensure that differences are calculated in the correct order:
df <- read.table(text = 'Tb_Period Hour Bird_ID Treatment meanHourTb
PreI 9 3500 LPS 41.55000
PreI 10 3500 LPS 41.75000
DayI 9 3500 LPS 40.88182
DayI 10 3500 LPS 41.24000', header = T, stringsAsFactors = F)
df <- df %>%
mutate(Tb_Period = factor(Tb_Period, c('PreI', 'DayI', 'PostI'), ordered = T)) %>%
group_by(Bird_ID, Hour) %>%
mutate(diff = meanHourTb - lag(meanHourTb, 1))
# A tibble: 4 x 6
# Groups: Bird_ID, Hour [2]
Tb_Period Hour Bird_ID Treatment meanHourTb diff
<ord> <int> <int> <chr> <dbl> <dbl>
1 PreI 9 3500 LPS 41.55000 NA
2 PreI 10 3500 LPS 41.75000 NA
3 DayI 9 3500 LPS 40.88182 -0.66818
4 DayI 10 3500 LPS 41.24000 -0.51000
这篇关于按组和时间匹配计算值之间的差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!