有条件合并,基于两个面板观察值之间发生的事件 [英] Conditional merge, based an event happening between two panel observations
问题描述
我有一个面板数据集:panel
和一个带有事件列表的数据集:Events
.对于面板数据集,等于panelID
表示两个观测值属于同一段.
I have a panel dataset: panel
and a dataset with a list of events: Events
. For the panel dataset, an equal panelID
shows that two observations belong together.
panelID = c(1:50)
year= c(2001:2010)
country = c("NLD", "GRC", "GBR")
n <- 2
library(data.table)
set.seed(123)
Panel <- data.table(panelID = rep(sample(panelID), each = n),
country = rep(sample(country, length(panelID), replace = T), each = n),
year = c(replicate(length(panelID), sample(year, n))),
some_NA = sample(0:5, 6),
some_NA_factor = sample(0:5, 6),
norm = round(runif(100)/10,2),
Income = round(rnorm(10,-5,5),2),
Happiness = sample(10,10),
Sex = round(rnorm(10,0.75,0.3),2),
Age = sample(100,100),
Educ = round(rnorm(10,0.75,0.3),2))
Panel[, uniqueID := .I] # Creates a unique ID
Panel[Panel == 0] <- NA
Events <- fread(
"Event_Type country year
A NLD 2005
C NLD 2004
A GBR 2006
B GBR 2003
A GRC 2002
D GRC 2007",
header = TRUE)
================================================ ================================
================================================================================
Events <- fread(
"Event_Type country year
A NLD 2005
A NLD 2004
A GBR 2006
A GBR 2003
A GRC 2002
A GRC 2007",
header = TRUE)
修改后的预期结果:
panleID country year 2002 2003 2004 2005 2006 2007
1 NLD 2002 NA NA 1 1 NA NA
1 NLD 2006 NA NA 1 1 NA NA
================================================ =======================
=======================================================================
如果事件的year
是在两次面板观察之间(并且在同一国家/地区),我希望将列Event_Type
中的值添加到Panel
中.
I would like the value in the column Event_Type
to be added to Panel
, if the year
of the event is between two panel observation (and in the same country).
作为示例,让我们进行以下面板观察:
As an example, let's take the following panel observation:
panleID country year
1 NLD 2002
1 NLD 2006
Panel
将获得4个额外的列A
至D
.如果在国家NLD
中发生的事件发生在2005年(第一行Events
在两年中的一个或两个之间发生),则A
列将在该列中显示1
.在以下内容中:
Panel
will get 4 extra columns A
to D
. The column A
, will get a 1
in the column if the event in country NLD
in the year 2005 (first line Events
, takes place in one of or between the two years. Since it does this results in the following:
panleID country year A B C D
1 NLD 2002 1 NA NA NA
1 NLD 2006 1 NA NA NA
我知道与同一年的合并如下:
I know that merging with the same year goes as follows:
merge(Panel, dcast(Events, iso + country ~ Event_Type),
by = c("country", "year"))
但是如果我希望这些值等于或介于两个panelID
年之间,应该如何进行合并?
But how should I make the merge if I want the values to be equal to or in between the two panelID
years?
推荐答案
使用data.table
可以解决您的问题
该代码可以缩短,但是我总是发现它很有用(特别是在SO上),可以显示介于两者之间的所有步骤,以便于进行错误检查和验证.
Here is a go at your problem using data.table
The code can be shortened, but I always find it useful (expecially on SO) to show all steps in between for easy error-checking and validation.
#first, summarise Panel, to get the time-span of the panelID
Panel.short <- Panel[, .(country = unique(country),
start = min(year),
end = max(year) ),
by = .(panelID)]
# panelID country start end
# 1: 1 NLD 2002 2006
#perform left non-equi join
Panel.short.joined <- Events[ Panel.short, on =.(country, year >= start, year <= end), mult = "all"][]
# Event_Type country year year.1 panelID
# 1: A NLD 2002 2006 1
# 2: C NLD 2002 2006 1
#cast to wide
Panel.final <- dcast( Panel.short.joined,
panelID + country ~ Event_Type,
fun.aggregate = length )
# panelID country A C
# 1: 1 NLD 1 1
#perform update join on the original Panel
Panel[, `:=`(A=0, B=0, C=0, D=0)][
Panel.final,
`:=`( A = i.A, C = i.C), # <- add B = i.B and D = i.D here
on = .( panelID )][]
# panelID country year A B C D
# 1: 1 NLD 2002 1 0 1 0
# 2: 1 NLD 2006 1 0 1 0
这篇关于有条件合并,基于两个面板观察值之间发生的事件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!