根据条件合并一列匹配的两个数据帧 [英] Merge two data frame where one column is matched based on condition

查看:107
本文介绍了根据条件合并一列匹配的两个数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

模拟数据:

set.seed(1)
df1 <- data.frame(country=c("US", "UK"),
                  year=c(2000, 2003))
df2 <- data.frame(country=rep(c("US", "UK"), 10),
                  year=rep(2000:2009, 2),
                  myvar=rnorm(20))

df1 包含感兴趣的国家年份。我想要获得这个国家年份和3年前后的 myvar 的价值。

df1 contains the country-year of interest. I want to get the myvar values for this country-year AND 3 years before and after.

换句话说,合并是根据 df2 $ country == df1 $ country AND df2 $ year> df1 $ year - 3& df2 $ year< df1 $ year + 3

In other words, the merging is done based on condition that df2$country==df1$country AND df2$year > df1$year - 3 & df2$year < df1$year + 3

编辑:我的(工作,不优雅)解决方案是填充 df1 创建我感兴趣的所有国家年份,然后以 df2 以常规方式合并。

My (working, not elegant) solution is to pad df1 to create all the country-years that I'm interested in, then merge with df2 the regular way.

library(plyr)
ddply(df1, c("country", "year"), 
  function(df) data.frame(rep(df$country, 7), (df$year-3):(df$year+3)))

生成

   country year rep.df.country..7. X.df.year...3...df.year...3.
1       UK 2003                 UK                         2000
2       UK 2003                 UK                         2001
3       UK 2003                 UK                         2002
4       UK 2003                 UK                         2003
5       UK 2003                 UK                         2004
6       UK 2003                 UK                         2005
7       UK 2003                 UK                         2006
8       US 2000                 US                         1997
9       US 2000                 US                         1998
10      US 2000                 US                         1999
11      US 2000                 US                         2000
12      US 2000                 US                         2001
13      US 2000                 US                         2002
14      US 2000                 US                         2003


推荐答案

在data.table中使用foverlaps的试用

A trial using foverlaps in data.table

set.seed(1)
df1 <- data.frame(country=c("US", "UK"),
                  year=c(2000, 2003, 2009, 2009))
df2 <- data.frame(country=rep(c("US", "UK"), 10),
                  year=rep(2000:2009, 2),
                  myvar=rnorm(20))
library(data.table)
setDT(df1); setDT(df2) # convert to data table
df1[, c("start", "end") := list(year-2, year+2)]
setkey(df1, country, start, end)
setkey(df2[, year2:=year], country, year, year2)
foverlaps(df1, df2, type="any")[,4:7:=NULL][]
    country year       myvar
 1:      UK 2001  0.18364332
 2:      UK 2001  0.38984324
 3:      UK 2003  1.59528080
 4:      UK 2003 -2.21469989
 5:      UK 2005 -0.82046838
 6:      UK 2005 -0.04493361
 7:      UK 2007  0.73832471
 8:      UK 2007  0.94383621
 9:      UK 2009 -0.30538839
10:      UK 2009  0.59390132
11:      US 2000 -0.62645381
12:      US 2000  1.51178117
13:      US 2002 -0.83562861
14:      US 2002 -0.62124058
15:      US 2008  0.57578135
16:      US 2008  0.82122120

这篇关于根据条件合并一列匹配的两个数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆