累加具有多个更改条件的行R data.table [英] Sum over rows with multiple changing conditions R data.table
问题描述
我试图在两个条件下在 data.frame
或 data.table
中创建一个列。与我已经看到,我已经尝试修改以下的帖子的区别是,我没有'值'的条件,但条件依赖于其他变量在 data.frame
。
让我们假设这是我的数据框:
mydf< - data.frame(Year = c(2000,2001,2002,2004,2005,
2007,2000,2001,2002,2003,
2003,2004,2005,2006,2006,2007)
Name = c(Tom,Tom,Tom,Fred,Gill,
Fred,Gill,Gill,Tom ,
Fred,Fred,Gill,Fred,Gill,Gill))
我想知道3个受试者在过去5年中经历过一次活动的次数。但是,如果事件日期超过5年,我不想包括它。我想我可以做一个指标变量的总和(如果主体在一年中经历了事件,则设置为1),同时指定 Year<年度& Year> = Year-5
。因此,基本上总结了比焦点年份小,焦点年之前大于或等于5年的一年的经验。
我已经创建了一个用于求和的指标变量为焦点年份-5
mydf $ Ind < - 1
mydf $ Yearm5 < - mydf $ Year -5
然后我转换为数据表的速度(原来的df有+ 60k obs) p>
library(data.table)
mydf< - data.table(mydf)
现在的问题是我不能得到这两个条件工作。我看过的帖子似乎都知道一个具体的值,通过它子集(例如 R数据表在多个条件上子集化),但在我的情况下,值从观察变为观察(不知道这是否意味着我需要做一些循环)。
我想我需要一些符合以下条件的东西:
mydf [,c Exp):= sum(Ind),by = c(Name)] [Year&年度& Year> = Yearm5]
给出:
空数据表(0行)5列:年份,名称,日历,年份5,Exp
只使用一个条件
mydf1 < - mydf [,c ):= sum(Ind),by = c(Name)] [Year> = Yearm5]
$ b b给出总体验,所以我假设
Year <年
条件。
我不太确定。我也试图修改建议:
如何累积添加值在一个向量在R中
与不运气再次似乎是错误的方式我指定的条件。library(dplyr)
mytest1< - mydf%>%
group_by(Name,Year)%>%
; Year& Year> = Yearm5)%>%
mutate(Exp = sum(Ind))
b $ b结果应如下所示:
myresult< - data.frame ,2004,2004,2006,
2007,2000,2001,2005,
2005,2006,2007,2000,
2001,2002,2002,2003),
姓名= c(Fred,Fill,Fred,Fred,
Fred,Gill,Gill,Gill,
Gill Gill,Tom,
Tom,Tom,Tom,Tom),
Ind = c(1,1,1,1,1,1,1 ,1,1,1,1,1,1,1,1,1,1,1),
Exp = c(0,1,1,3,4,0,1,1,1,1,1,1,1,1,1,1,1,1,1) 0,1,2,2,4),
Yearm5 = c(1998,1999,1999,2001,2002,
1995,1996,2000,2000,2001,
2002,1995 ,1996,1996,1997,1998))
任何帮助或指针都不胜感激。 >
解决方案这里是使用
rollapply
和数据的方法。表
library(zoo)
setDT(mydf)
setkey mydf,Name,Year)
#创建一个具有所有年份和事件的数据表,包括5年窗口
#,并且总计每个主题每年的发生次数
m< - mydf [CJ(unique(Name),seq(min(Year)-5,max(Year))),allow.cartesian = TRUE] [,
list (Ind,na.rm = TRUE)),
keyby = list(Name,Year)]
#使用rollapply这个更大的data.table来获取
#incidences前5年(不包括今年(因此头(x,-1))
m [,Exp:= rollapply(I2,5,function(x)sum $ b align ='right',fill = 0),by = Name]
#加入原始数据以创建所需的数据
m [mydf,!c('I2'),with = FALSE]
名字年份投资
#1:Fred 2003 1 0
#2:Fred 2004 1 1
#3:Fred 2004 1 1
#4:Fred 2006 1 3
#5:Fred 2007 1 4
#6:Gill 2000 1 0
#7:Gill 2001 1 1
#8:Gill 2005 1 1
#9:Gill 2005 1 1
#10:Gill 2006 1 2
#11:Gill 2007 1 3
#12:Tom 2000 1 0
#13:Tom 2001 1 1
#14:Tom 2002 1 2
#15:Tom 2002 1 2
#16:Tom 2003 1 4
I am trying to create a column in a
data.frame
ordata.table
with two conditions. The difference to the posts I have seen and which I have tried to modify below is that I do not have 'value' for the conditions but the conditions depend on other variables in thedata.frame
.Let's assume this is my data frame:
mydf <- data.frame (Year = c(2000, 2001, 2002, 2004, 2005, 2007, 2000, 2001, 2002, 2003, 2003, 2004, 2005, 2006, 2006, 2007), Name = c("Tom", "Tom", "Tom", "Fred", "Gill", "Fred", "Gill", "Gill", "Tom", "Tom", "Fred", "Fred", "Gill", "Fred", "Gill", "Gill"))
I want to find out how many times the 3 subjects have experienced an event in the last 5 years. However, if the event dates go back more than 5 years, I do not want to include it. I thought I could do a sum of an indicator variable (set to 1 if the subject experienced the event in the year) while specifying something along the lines of
Year < Year & Year >= Year-5
. So basically sum the experiences for the year smaller than the focal year and larger than or equal to 5 years before the focal year.I have create an indicator for summing and a variable for focal year - 5
mydf$Ind <- 1 mydf$Yearm5 <- mydf$Year-5
Then I convert to data table for speed (the original df has +60k obs)
library(data.table) mydf <- data.table(mydf)
The issue now is that I cannot get the two conditions to work. The post I have seen seem to all know a specific value by which to subset (e.g. R data.table subsetting on multiple conditions.), but in my case the value changes from observation to observation (not sure if this means I need to do some looping?).
I thought I need something along the lines of:
mydf[, c("Exp"):= sum(Ind), by = c("Name")][Year < Year & Year >= Yearm5]
gives:
Empty data.table (0 rows) of 5 cols: Year,Name,Ind,Yearm5,Exp
Using just one condition
mydf1 <- mydf[, c("Exp"):= sum(Ind), by = c("Name")][Year >= Yearm5]
gives the total experience so I am assuming that something is wrong with the
Year < Year
condition.I am not quite sure what though. I have also tried to modify the suggestions in: how to cumulatively add values in one vector in R with not luck again something seems to be wrong with the way I specify the conditions.
library(dplyr) mytest1 <- mydf %>% group_by(Name, Year) %>% filter(Year < Year & Year >= Yearm5) %>% mutate(Exp = sum(Ind))
The result should look as follows:
myresult <- data.frame (Year = c(2003, 2004, 2004, 2006, 2007, 2000, 2001, 2005, 2005, 2006, 2007, 2000, 2001, 2002, 2002, 2003), Name = c("Fred", "Fred", "Fred", "Fred", "Fred", "Gill", "Gill", "Gill", "Gill", "Gill", "Gill", "Tom", "Tom", "Tom", "Tom", "Tom"), Ind = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), Exp = c(0, 1, 1, 3, 4, 0, 1, 1, 1, 2, 3, 0, 1, 2, 2, 4), Yearm5 = c(1998, 1999, 1999, 2001, 2002, 1995, 1996, 2000, 2000, 2001, 2002, 1995, 1996, 1996, 1997, 1998))
Any help or pointers would be appreciated!
解决方案Here is an approach using
rollapply
anddata.table
library(zoo) setDT(mydf) setkey(mydf, Name,Year) # create a data.table that has all Years and incidences including the 5 year window # and sum up the number of incidences per year for each subject m <- mydf[CJ(unique(Name),seq(min(Year)-5, max(Year))),allow.cartesian=TRUE][, list(Ind = unique(Ind), I2 = sum(Ind,na.rm=TRUE)), keyby=list(Name,Year)] # use rollapply over this larger data.table to get the number of # incidences in the previous 5 years (not including this year (hence head(x,-1)) m[,Exp := rollapply(I2, 5, function(x) sum(head(x,-1)), align = 'right', fill=0),by=Name] # join with the original to create your required data m[mydf,!c('I2'),with=FALSE] Name Year Ind Exp # 1: Fred 2003 1 0 # 2: Fred 2004 1 1 # 3: Fred 2004 1 1 # 4: Fred 2006 1 3 # 5: Fred 2007 1 4 # 6: Gill 2000 1 0 # 7: Gill 2001 1 1 # 8: Gill 2005 1 1 # 9: Gill 2005 1 1 # 10: Gill 2006 1 2 # 11: Gill 2007 1 3 # 12: Tom 2000 1 0 # 13: Tom 2001 1 1 # 14: Tom 2002 1 2 # 15: Tom 2002 1 2 # 16: Tom 2003 1 4
这篇关于累加具有多个更改条件的行R data.table的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!