将一个数据帧中的数据与第二个数据帧中适合的时间块进行匹配 [英] Matching data from one dataframe to a time block that it fits within in a second dataframe
问题描述
我有两个要根据日期时间信息进行匹配的数据集.数据集1中的日期时间值表示以给定时间戳记结束的10分钟时间块.我想将数据集2中的每一行与数据集1内的10分钟时间段(及相关数据)进行匹配.数据示例如下:
I have two datasets that I would like to match based on datetime information. Datetime values in dataset 1 represent 10-minute time blocks that end at the given timestamp. I would like to match each row in dataset 2 with the 10-minute time block (and associated data) that it falls within from dataset 1. An example of the data is below:
Dataset 1
datetime count
10/11/2012 16:25 231
10/11/2012 16:35 55
10/11/2012 16:45 0
10/11/2012 16:55 30
10/11/2012 17:05 22
10/11/2012 17:15 431
10/11/2012 17:25 455
10/11/2012 17:35 560
10/11/2012 17:45 7
10/11/2012 17:55 36
10/11/2012 18:05 12
10/11/2012 18:15 144
10/11/2012 18:25 583
Dataset 2
datetime a n dpm
10/11/2012 16:26 2.03 27 3473
10/11/2012 17:24 1.35 28 3636
10/11/2012 18:21 7.63 29 3516
我想得到的最终结果是这样的:
I would like to end up with something that looks like this:
datetime2 a n dpm datetime1 count
10/11/2012 16:26 2.03 27 3473 10/11/2012 16:35 55
10/11/2012 17:24 1.35 28 3636 10/11/2012 17:25 455
10/11/2012 18:21 7.63 29 3516 10/11/2012 18:25 583
我知道有诸如cbind(),rbind()和merge()之类的函数可用于合并数据集(基于我过去问过的一个相关问题:
I am aware that there are functions such as cbind(), rbind() and merge() that can be used to merge datasets (based on a related question that I asked in the past: How to merge two data frames in r by a common column with mismatched date/time values), but I have been unable to write code that will help in this case. I have tried various combinations of cbind(round(data, "mins")) and cbind(trun(data, "hours")), but these functions seem to match multiple time blocks from dataset 1 to each row of dataset 2, rather than just the block that each row falls within. I have tried for hours to find some conditional rule that will solve this problem, but am having no luck. Any help would be greatly appreciated.
推荐答案
是这样的吗?
findRow <- function(dt, df) { min(which(df$datetime > dt )) }
rows <- sapply(df2$datetime, findRow, df=df1)
res <- cbind(df2, df1[rows,])
datetime a n dpm datetime count
2 10/11/2012 16:26 2.03 27 3473 10/11/2012 16:35 55
7 10/11/2012 17:24 1.35 28 3636 10/11/2012 17:25 455
13 10/11/2012 18:21 7.63 29 3516 10/11/2012 18:25 583
PS1:我认为您的预期结果计数在第1行上是错误的
PS1: I think the count of your expected result is wrong on row #1
PS2:如果您将数据集直接可用就可以了. 形式. 我必须做:
PS2: It would have been easier if you had provided the datasets in a directly usable form. I had to do:
d1 <-
'datetime count
10/11/2012 16:25 231
...
'
d2 <-
'datetime a n dpm
10/11/2012 16:26 2.03 27 3473
10/11/2012 17:24 1.35 28 3636
10/11/2012 18:21 7.63 29 3516
'
.parse <- function(s) {
cs <- gsub('\\s\\s+', '\t', s)
read.table(text=cs, sep="\t", header=TRUE, stringsAsFactors=FALSE)
}
df1 <- .parse(d1)
df2 <- .parse(d2)
这篇关于将一个数据帧中的数据与第二个数据帧中适合的时间块进行匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!