有条件合并,基于两个面板观察值之间发生的事件 [英] Conditional merge, based an event happening between two panel observations

查看:49
本文介绍了有条件合并,基于两个面板观察值之间发生的事件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个面板数据集:panel和一个带有事件列表的数据集:Events.对于面板数据集,等于panelID表示两个观测值属于同一段.

I have a panel dataset: panel and a dataset with a list of events: Events. For the panel dataset, an equal panelID shows that two observations belong together.

panelID = c(1:50)   
year= c(2001:2010)
country = c("NLD", "GRC", "GBR")

n <- 2

library(data.table)
set.seed(123)
Panel <- data.table(panelID = rep(sample(panelID), each = n),
                 country = rep(sample(country, length(panelID), replace = T), each = n),
                 year = c(replicate(length(panelID), sample(year, n))),
                 some_NA = sample(0:5, 6),                                             
                 some_NA_factor = sample(0:5, 6),         
                 norm = round(runif(100)/10,2),
                 Income = round(rnorm(10,-5,5),2),
                 Happiness = sample(10,10),
                 Sex = round(rnorm(10,0.75,0.3),2),
                 Age = sample(100,100),
                 Educ = round(rnorm(10,0.75,0.3),2))        
Panel[, uniqueID := .I]                                                                        # Creates a unique ID     
Panel[Panel == 0] <- NA    


Events <- fread(
"Event_Type  country year   
A   NLD   2005
C   NLD   2004       
A   GBR   2006
B   GBR   2003   
A   GRC   2002             
D   GRC   2007",
header = TRUE)

================================================ ================================

================================================================================

Events <- fread(
"Event_Type  country year   
A   NLD   2005
A   NLD   2004       
A   GBR   2006
A   GBR   2003   
A   GRC   2002             
A   GRC   2007",
header = TRUE)

修改后的预期结果:

panleID country year 2002  2003  2004 2005 2006 2007 
1       NLD     2002 NA    NA    1    1    NA   NA 
1       NLD     2006 NA    NA    1    1    NA   NA 

================================================ =======================

=======================================================================

如果事件的year是在两次面板观察之间(并且在同一国家/地区),我希望将列Event_Type中的值添加到Panel中.

I would like the value in the column Event_Type to be added to Panel, if the year of the event is between two panel observation (and in the same country).

作为示例,让我们进行以下面板观察:

As an example, let's take the following panel observation:

panleID country year
1       NLD     2002
1       NLD     2006

Panel将获得4个额外的列AD.如果在国家NLD中发生的事件发生在2005年(第一行Events在两年中的一个或两个之间发生),则A列将在该列中显示1.在以下内容中:

Panel will get 4 extra columns A to D. The column A, will get a 1 in the column if the event in country NLD in the year 2005 (first line Events, takes place in one of or between the two years. Since it does this results in the following:

panleID country year A  B  C  D 
1       NLD     2002 1  NA NA NA
1       NLD     2006 1  NA NA NA

我知道与同一年的合并如下:

I know that merging with the same year goes as follows:

merge(Panel, dcast(Events, iso + country ~ Event_Type),
      by = c("country", "year"))

但是如果我希望这些值等于或介于两个panelID年之间,应该如何进行合并?

But how should I make the merge if I want the values to be equal to or in between the two panelID years?

推荐答案

使用data.table可以解决您的问题 该代码可以缩短,但是我总是发现它很有用(特别是在SO上),可以显示介于两者之间的所有步骤,以便于进行错误检查和验证.

Here is a go at your problem using data.table The code can be shortened, but I always find it useful (expecially on SO) to show all steps in between for easy error-checking and validation.

#first, summarise Panel, to get the time-span of the panelID
Panel.short <- Panel[, .(country = unique(country), 
                         start = min(year), 
                         end = max(year) ), 
                     by = .(panelID)]
#    panelID country start  end
# 1:       1     NLD  2002 2006

#perform left non-equi join
Panel.short.joined <- Events[ Panel.short, on =.(country, year >= start, year <= end), mult = "all"][]
#    Event_Type country year year.1 panelID
# 1:          A     NLD 2002   2006       1
# 2:          C     NLD 2002   2006       1

#cast to wide
Panel.final <- dcast( Panel.short.joined, 
       panelID + country ~ Event_Type, 
       fun.aggregate = length )
#    panelID country A C
# 1:       1     NLD 1 1

#perform update join on the original Panel
Panel[, `:=`(A=0, B=0, C=0, D=0)][ 
  Panel.final, 
  `:=`( A = i.A, C = i.C),   # <- add B = i.B and D = i.D here 
  on = .( panelID )][]
#    panelID country year A B C D
# 1:       1     NLD 2002 1 0 1 0
# 2:       1     NLD 2006 1 0 1 0

这篇关于有条件合并,基于两个面板观察值之间发生的事件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆