如果属于R中另一个数据集中的两个变量定义的范围,则从一个数据集获取变量值 [英] Get a variables value from one dataset if falling in a range defined by two variables in another dataset in R

查看:621
本文介绍了如果属于R中另一个数据集中的两个变量定义的范围,则从一个数据集获取变量值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个关于日期操纵的问题。我已经看了几天,但在网上找不到任何帮助。我有一个数据集,我有id和两个日期,另一个数据集与id变量,日期和价格相同。例如:



  x = data.frame(id = c(A,B,C,C ,date1 = c(29/05/2013​​,23/08/2011,25/09/2011,18/11/2011),date2 = c(10/07/2013​​, 04/10/2011,10/11/2011,15/12/2011))> x id date1 date21 A 29/05/2013 10/07/20132 B 23/08/2011 04/10/20113 C 25/09/2011 10/11/20114 C 18/11/2011 15/12 / 2011y =数据。框(id = c(A,A,A,B,B,B,B,B,B,C ,C),date = c(21/02/2013​​,19/06/2013​​,31/07/2013​​,07/10/2011,16/01/2012 10/07/2012,20/09/2012,29/11/2012,15/08/2014,27/09/2011,27/01/2012,09 / 03/2012),price = c(126,109,111,14,13.8,14.1,14,14.4,143,102,114,116))> y id日期价格1 A 21/02/2013 126.02 A 19/06/2013 109.03 A 31/07/2013 111.04 B 07/10/2011 14.05 B 16/01/2012 13.86 B 10/07/2012 14.17 B 20/09 / 2012 14.08 B 29/11/2012 14.49 B 15/08/2014 143.010 C 27/09/2011 102.011 C 27/01/2012 114.012 C 09/03/2012 116.0  



我想要做的是在数据集x中查找两个日期,如果在数据集y中有一个日期由数据集x中的两个日期定义为相同的id,以选择该id和date的价格值。如果没有丢失。所以基本上我想要得到一个最终的数据集:



  final = data.frame(id = c(A B,C,C),date1 = c(29/05/2013,23/08/2011,25/09/2011,18/11/2011),date2 = c(10/07/2013​​,04/10/2011,10/11/2011,15/12/2011),date = c(19/06/2013​​,NA ,27/09/2011,NA),price = c(109,NA,102,NA))> final id date1 date2 date price1 A 29/05/2013 10/07/2013 19/06/2013 1092 B 23/08/2011 04/10/2011 20/09/2012 143 C 25/09/2011 10/11 / 2011 27/09/2011 1024 C 18/11/2011 15/12/2011 NA NA  



任何帮助将不胜感激。

解决方案

这里根据 foverlaps data.table 包。

  library(data.table)
##强制字符到日期(数字)
setDT(x )[,c(date1,date2):= list(as.Date(date1,%d /%m /%Y),
as.Date(date2,%d /% m /%Y))]
##和一个虚拟日期,因为foverlaps寻找一个开始,结束列
setDT(y)[,c(date1):= as.Date ,%d /%m /%Y)] [,date:= date1]
## y必须键入
setkey(y,id,date,date1)
foverlaps x,y,by.x = c(id,date1,date2))[,
list(id,i.date1,date2,date,price)]

id i.date1 date2 date price
1:A 2013-05-29 2013-07-10 2013-06-19 109
2:B 2011-08-23 2011-10-04< ; NA> NA
3:C 2011-09-25 2011-11-10 2011-09-27 102
4:C 2011-11-18 2011-12-15< NA> NA

PS:结果不尽相同,因为您的预期输出有错误。 / p>

I have a question regarding dates manipulation in R. I've looked around for days but couldn't find any help online. I have a dataset where I have id and two dates and another dataset with the same id variable, date and price. For example:

x = data.frame(id = c("A","B","C","C"), 
               date1 = c("29/05/2013", "23/08/2011", "25/09/2011",  "18/11/2011"),    
               date2 = c("10/07/2013", "04/10/2011", "10/11/2011",	"15/12/2011") )
> x
  id      date1      date2
1  A 29/05/2013 10/07/2013
2  B 23/08/2011 04/10/2011
3  C 25/09/2011 10/11/2011
4  C 18/11/2011 15/12/2011

y = data.frame(id = c("A","A","A","B","B","B","B","B","B","C","C","C"),
              date = c("21/02/2013",  "19/06/2013",	"31/07/2013",	"07/10/2011",	"16/01/2012",	"10/07/2012","20/09/2012",	"29/11/2012",		"15/08/2014",	"27/09/2011",	"27/01/2012",	"09/03/2012"),
              price = c(126,109,111,14,13.8,14.1,14,	14.4,143,102,114,116))
> y
   id       date price
1   A 21/02/2013 126.0
2   A 19/06/2013 109.0
3   A 31/07/2013 111.0
4   B 07/10/2011  14.0
5   B 16/01/2012  13.8
6   B 10/07/2012  14.1
7   B 20/09/2012  14.0
8   B 29/11/2012  14.4
9   B 15/08/2014 143.0
10  C 27/09/2011 102.0
11  C 27/01/2012 114.0
12  C 09/03/2012 116.0

What I would like to do is to look for the two dates in dataset x and if there is a date in dataset y inside defined by the two dates in dataset x for the same id, to pick the value of price for that id and date. If not have it as missing. So basically I want to end up with a final dataset like that:

final = data.frame(id = c("A","B","C","C"), 
                   date1 = c("29/05/2013", "23/08/2011", "25/09/2011",  "18/11/2011"),    
                   date2 = c("10/07/2013", "04/10/2011", "10/11/2011",  "15/12/2011"),
                   date = c("19/06/2013",  "NA",	"27/09/2011",	"NA"),
                   price = c(109,"NA",102,"NA")  )  

> final
  id      date1      date2       date price
1  A 29/05/2013 10/07/2013 19/06/2013   109
2  B 23/08/2011 04/10/2011 20/09/2012    14
3  C 25/09/2011 10/11/2011 27/09/2011   102
4  C 18/11/2011 15/12/2011         NA    NA

Any help will be much appreciated.

解决方案

Here a solution based on the excellent foverlaps of the data.table package.

library(data.table)
## coerce characters to dates ( numeric) 
setDT(x)[,c("date1","date2"):=list(as.Date(date1,"%d/%m/%Y"),
                                   as.Date(date2,"%d/%m/%Y"))]
## and a dummy date since foverlaps looks for a start,end columns 
setDT(y)[,c("date1"):=as.Date(date,"%d/%m/%Y")][,date:=date1]
## y must be keyed
setkey(y,id,date,date1)
foverlaps(x,y,by.x=c("id","date1","date2"))[,
            list(id,i.date1,date2,date,price)]

  id    i.date1      date2       date price
1:  A 2013-05-29 2013-07-10 2013-06-19   109
2:  B 2011-08-23 2011-10-04       <NA>    NA
3:  C 2011-09-25 2011-11-10 2011-09-27   102
4:  C 2011-11-18 2011-12-15       <NA>    NA

PS: the result is not exactly the same because you have an error in your expected output.

这篇关于如果属于R中另一个数据集中的两个变量定义的范围,则从一个数据集获取变量值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆