累加具有多个更改条件的行R data.table [英] Sum over rows with multiple changing conditions R data.table

查看:615
本文介绍了累加具有多个更改条件的行R data.table的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图在两个条件下在 data.frame data.table 中创建一个列。与我已经看到,我已经尝试修改以下的帖子的区别是,我没有'值'的条件,但条件依赖于其他变量在 data.frame



让我们假设这是我的数据框:

  mydf< -  data.frame(Year = c(2000,2001,2002,2004,2005,
2007,2000,2001,2002,2003,
2003,2004,2005,2006,2006,2007)
Name = c(Tom,Tom,Tom,Fred,Gill,
Fred,Gill,Gill,Tom ,
Fred,Fred,Gill,Fred,Gill,Gill))

我想知道3个受试者在过去5年中经历过一次活动的次数。但是,如果事件日期超过5年,我不想包括它。我想我可以做一个指标变量的总和(如果主体在一年中经历了事件,则设置为1),同时指定 Year<年度& Year> = Year-5 。因此,基本上总结了比焦点年份小,焦点年之前大于或等于5年的一年的经验。



我已经创建了一个用于求和的指标变量为焦点年份-5

  mydf $ Ind < -  1 
mydf $ Yearm5 < - mydf $ Year -5

然后我转换为数据表的速度(原来的df有+ 60k obs) p>

  library(data.table)
mydf< - data.table(mydf)

现在的问题是我不能得到这两个条件工作。我看过的帖子似乎都知道一个具体的值,通过它子集(例如 R数据表在多个条件上子集化),但在我的情况下,值从观察变为观察(不知道这是否意味着我需要做一些循环)。



我想我需要一些符合以下条件的东西:

  mydf [,c Exp):= sum(Ind),by = c(Name)] [Year&年度& Year> = Yearm5] 

给出:

 空数据表(0行)5列:年份,名称,日历,年份5,Exp 

只使用一个条件

  mydf1 < -  mydf [,c ):= sum(Ind),by = c(Name)] [Year> = Yearm5] 


$ b b

给出总体验,所以我假设 Year <年条件。



我不太确定。我也试图修改建议:
如何累积添加值在一个向量在R中
与不运气再次似乎是错误的方式我指定的条件。

  library(dplyr)
mytest1< - mydf%>%
group_by(Name,Year)%>%
; Year& Year> = Yearm5)%>%
mutate(Exp = sum(Ind))


b $ b

结果应如下所示:

  myresult<  -  data.frame ,2004,2004,2006,
2007,2000,2001,2005,
2005,2006,2007,2000,
2001,2002,2002,2003),
姓名= c(Fred,Fill,Fred,Fred,
Fred,Gill,Gill,Gill,
Gill Gill,Tom,
Tom,Tom,Tom,Tom),
Ind = c(1,1,1,1,1,1,1 ,1,1,1,1,1,1,1,1,1,1,1),
Exp = c(0,1,1,3,4,0,1,1,1,1,1,1,1,1,1,1,1,1,1) 0,1,2,2,4),
Yearm5 = c(1998,1999,1999,2001,2002,
1995,1996,2000,2000,2001,
2002,1995 ,1996,1996,1997,1998))

任何帮助或指针都不胜感激。 >

这里是使用 rollapply 数据的方法。表

  library(zoo)
setDT(mydf)
setkey mydf,Name,Year)
#创建一个具有所有年份和事件的数据表,包括5年窗口
#,并且总计每个主题每年的发生次数
m< - mydf [CJ(unique(Name),seq(min(Year)-5,max(Year))),allow.cartesian = TRUE] [,
list (Ind,na.rm = TRUE)),
keyby = list(Name,Year)]
#使用rollapply这个更大的data.table来获取
#incidences前5年(不包括今年(因此头(x,-1))
m [,Exp:= rollapply(I2,5,function(x)sum $ b align ='right',fill = 0),by = Name]
#加入原始数据以创建所需的数据
m [mydf,!c('I2'),with = FALSE]
名字年份投资
#1:Fred 2003 1 0
#2:Fred 2004 1 1
#3:Fred 2004 1 1
#4:Fred 2006 1 3
#5:Fred 2007 1 4
#6:Gill 2000 1 0
#7:Gill 2001 1 1
#8:Gill 2005 1 1
#9:Gill 2005 1 1
#10:Gill 2006 1 2
#11:Gill 2007 1 3
#12:Tom 2000 1 0
#13:Tom 2001 1 1
#14:Tom 2002 1 2
#15:Tom 2002 1 2
#16:Tom 2003 1 4


I am trying to create a column in a data.frame or data.table with two conditions. The difference to the posts I have seen and which I have tried to modify below is that I do not have 'value' for the conditions but the conditions depend on other variables in the data.frame.

Let's assume this is my data frame:

mydf <- data.frame (Year = c(2000, 2001, 2002, 2004, 2005,
                             2007, 2000, 2001, 2002, 2003,
                             2003, 2004, 2005, 2006, 2006, 2007),
                    Name = c("Tom", "Tom", "Tom", "Fred", "Gill",
                             "Fred", "Gill", "Gill", "Tom", "Tom",
                             "Fred", "Fred", "Gill", "Fred", "Gill", "Gill"))

I want to find out how many times the 3 subjects have experienced an event in the last 5 years. However, if the event dates go back more than 5 years, I do not want to include it. I thought I could do a sum of an indicator variable (set to 1 if the subject experienced the event in the year) while specifying something along the lines of Year < Year & Year >= Year-5. So basically sum the experiences for the year smaller than the focal year and larger than or equal to 5 years before the focal year.

I have create an indicator for summing and a variable for focal year - 5

mydf$Ind <- 1
mydf$Yearm5 <- mydf$Year-5

Then I convert to data table for speed (the original df has +60k obs)

library(data.table)
mydf <- data.table(mydf)

The issue now is that I cannot get the two conditions to work. The post I have seen seem to all know a specific value by which to subset (e.g. R data.table subsetting on multiple conditions.), but in my case the value changes from observation to observation (not sure if this means I need to do some looping?).

I thought I need something along the lines of:

mydf[, c("Exp"):= sum(Ind), by = c("Name")][Year < Year & Year >= Yearm5]

gives:

Empty data.table (0 rows) of 5 cols: Year,Name,Ind,Yearm5,Exp

Using just one condition

mydf1 <- mydf[, c("Exp"):= sum(Ind), by = c("Name")][Year >= Yearm5] 

gives the total experience so I am assuming that something is wrong with the Year < Year condition.

I am not quite sure what though. I have also tried to modify the suggestions in: how to cumulatively add values in one vector in R with not luck again something seems to be wrong with the way I specify the conditions.

library(dplyr)
mytest1 <- mydf %>%
           group_by(Name, Year) %>%
           filter(Year < Year & Year >= Yearm5) %>%
           mutate(Exp = sum(Ind))

The result should look as follows:

myresult <- data.frame (Year = c(2003, 2004, 2004, 2006,
                                 2007, 2000, 2001, 2005,
                                 2005, 2006, 2007, 2000,
                                 2001, 2002, 2002, 2003),
                        Name = c("Fred", "Fred", "Fred", "Fred",
                                 "Fred", "Gill", "Gill", "Gill",
                                 "Gill", "Gill", "Gill", "Tom",
                                 "Tom", "Tom", "Tom", "Tom"),
                        Ind = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1),
                        Exp = c(0, 1, 1, 3, 4, 0, 1, 1, 1, 2, 3, 0, 1, 2, 2, 4),
                        Yearm5 = c(1998, 1999, 1999, 2001, 2002,
                                   1995, 1996, 2000, 2000, 2001,
                                   2002, 1995, 1996, 1996, 1997, 1998))

Any help or pointers would be appreciated!

解决方案

Here is an approach using rollapply and data.table

library(zoo)
 setDT(mydf)
 setkey(mydf, Name,Year)
 # create a data.table that has all Years and incidences including the 5 year window 
 # and sum up the number of incidences per year for each subject 
m <- mydf[CJ(unique(Name),seq(min(Year)-5, max(Year))),allow.cartesian=TRUE][,
            list(Ind = unique(Ind), I2 = sum(Ind,na.rm=TRUE)),
            keyby=list(Name,Year)]
# use rollapply over this larger data.table to get the number of
# incidences in the previous 5 years (not including this year (hence head(x,-1))
m[,Exp := rollapply(I2, 5, function(x) sum(head(x,-1)), 
                    align = 'right', fill=0),by=Name]
# join with the original to create your required data
m[mydf,!c('I2'),with=FALSE]
   Name Year Ind Exp
#  1: Fred 2003   1   0
#  2: Fred 2004   1   1
#  3: Fred 2004   1   1
#  4: Fred 2006   1   3
#  5: Fred 2007   1   4
#  6: Gill 2000   1   0
#  7: Gill 2001   1   1
#  8: Gill 2005   1   1
#  9: Gill 2005   1   1
# 10: Gill 2006   1   2
# 11: Gill 2007   1   3
# 12:  Tom 2000   1   0
# 13:  Tom 2001   1   1
# 14:  Tom 2002   1   2
# 15:  Tom 2002   1   2
# 16:  Tom 2003   1   4

这篇关于累加具有多个更改条件的行R data.table的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆