如何计算时间加权平均值并产生滞后 [英] How to calculate time-weighted average and create lags

查看:365
本文介绍了如何计算时间加权平均值并产生滞后的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在论坛上进行了搜索,但没有找到任何答案或提示如何在论坛上做我想做的事情.

I have searched the forum, but found nothing that could answer or provide hint on how to do what I wish to on the forum.

我对暴露数据进行了年度测量,我希望根据每个人的研究数据来计算个人水平的年平均水平.对于每一行,一年的暴露分配应包括从加入研究前的最后一个月开始的前12个月的数据. 例如,样本数据中的第一人于2002年2月7日加入了研究.他的暴露范围包括2002年1月(年平均为18)和2001年2月至2001年12月(年平均为19)的贡献.此人的时间加权平均值为(1/12 * 18)+(11/12 * 19).同一个人的两年平均接触时间将从2002年1月延长到2000年2月.

I have yearly measurement of exposure data from which I wish to calculate individual level annual average based on entry of each individual into the study. For each row the one year exposure assignment should include data from the preceding 12 months starting from the last month before joining the study. As an example the first person in the sample data joined the study on Feb 7, 2002. His exposure will include a contribution of January 2002 (annual average is 18) and February to December 2001 (annual average is 19). The time weighted average for this person would be (1/12*18) + (11/12*19). The two year average exposure for the same person would extend back from January 2002 to February 2000.

类似地,对于最后一位于2004年12月加入研究的人,将包括2004年的11个月和2003年的1个月的缴款,他的年平均风险敞口将是(11/12 * 5)从2004和(1/12 * 6)来自2003年的年平均值.

Similarly, for last person who joined the study in December 2004 will include contribution on 11 months in 2004 and one month in 2003 and his annual average exposure will be (11/12*5 ) derived form 2004 and (1/12*6) which comes from the annual average of 2003.

我该如何计算自入学之日起的1年,2年和5年的平均暴露量?如何以我描述的方式使用滞后?

How can I calculate the 1, 2 and 5 year average exposure going back from the date of entry into study? How can I use lags in the manner taht I hve described?

可以从此链接访问示例数据

Sample data is accessed from this link

https://drive.google.com/file/d/0B_4NdfcEvU7La1ZCd2EtbEdaeGs/view?usp = sharing

推荐答案

这不是一个很好的答案.但是,我想离开我的尝试.我首先安排了数据框.我想确定哪一年将是每个学科的关键年.因此,我创建了id. variable来自原始数据集中的列名(例如pol_2000). entryYear来自数据中的entry. entryMonth也来自entry.创建check的目的是确定每个参与者的基准年是哪一年.在下一步中,我使用SOfun包中的getMyRows为每个参与者提取了六行.在下一步中,我使用lapply并按照问题中的描述进行数学运算.为了计算两年/五年的平均值,我将总值除以年份(2或5).我不确定最终的输出是什么样子.因此,我决定为每个主题使用基年,并在其中添加三列.

This is not an elegant answer. But, I would like to leave what I tried. I first arranged the data frame. I wanted to identify which year will be the key year for each subject. So, I created id. variable comes from the column names (e.g., pol_2000) in your original data set. entryYear comes from entry in your data. entryMonth comes from entry as well. check was created in order to identify which year is the base year for each participant. In my next step, I extracted six rows for each participant using getMyRows in the SOfun package. In the next step, I used lapply and did math as you described in your question. For the calculation for two/five year average, I divided the total values by year (2 or 5). I was not sure how the final output would look like. So I decided to use the base year for each subject and added three columns to it.

library(stringi)
library(SOfun)
devtools::install_github("hadley/tidyr")
library(tidyr)
library(dplyr)


### Big thanks to BondedDust for this function
### http://stackoverflow.com/questions/6987478/convert-a-month-abbreviation-to-a-numeric-month-in-r

mo2Num <- function(x) match(tolower(x), tolower(month.abb))


### Arrange the data frame.
ana <- foo %>%
       mutate(id = 1:n()) %>%
       melt(id.vars = c("id","entry")) %>%
       arrange(id) %>%
       mutate(variable = as.numeric(gsub("^.*_", "", variable)),
              entryYear = as.numeric(stri_extract_last(entry, regex = "\\d+")),
              entryMonth = mo2Num(substr(entry, 3,5)) - 1,
              check = ifelse(variable == entryYear, "Y", "N"))

### Find a base year for each subject and get some parts of data for each participant.
indx <- which(ana$check == "Y")
bob <- getMyRows(ana, pattern = indx, -5:0)


### Get one-year average
cathy <- lapply(bob, function(x){
    x$one <- ((x[6,6] / 12) * x[6,4]) + (((12-x[5,6])/12) * x[5,4])
    x 
})

one <- unnest(lapply(cathy, `[`, i = 6, j = 8))

### Get two-year average
cathy <- lapply(bob, function(x){
    x$two <- (((x[6,6] / 12) * x[6,4]) + x[5,4] + (((12-x[4,6])/12) * x[4,4])) / 2
    x 
})

two <- unnest(lapply(cathy, `[`, i = 6, j =8))


### Get five-year average
cathy <- lapply(bob, function(x){
    x$five <- (((x[6,6] / 12) * x[6,4]) + x[5,4] + x[4,4] + x[3,4] + x[2,4] + (((12-x[2,6])/12) * x[1,4])) / 5 
    x 
})

five <- unnest(lapply(cathy, `[`, i =6 , j =8))

### Combine the results with the key observations
final <- cbind(ana[which(ana$check == "Y"),], one, two, five)
colnames(final) <- c(names(ana), "one", "two", "five")

#   id     entry variable value entryYear entryMonth check       one       two      five
#6   1 07feb2002     2002    18      2002          1     Y 18.916667 18.500000 18.766667
#14  2 06jun2002     2002    16      2002          5     Y 16.583333 16.791667 17.150000
#23  3 16apr2003     2003    14      2003          3     Y 15.500000 15.750000 16.050000
#31  4 26may2003     2003    16      2003          4     Y 16.666667 17.166667 17.400000
#39  5 11jun2003     2003    13      2003          5     Y 13.583333 14.083333 14.233333
#48  6 20feb2004     2004     3      2004          1     Y  3.000000  3.458333  3.783333
#56  7 25jul2004     2004     2      2004          6     Y  2.000000  2.250000  2.700000
#64  8 19aug2004     2004     4      2004          7     Y  4.000000  4.208333  4.683333
#72  9 19dec2004     2004     5      2004         11     Y  5.083333  5.458333  4.800000

这篇关于如何计算时间加权平均值并产生滞后的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆