如何在R中的两个日期之间进行汇总? [英] How to aggregate between the two dates in R?

查看:136
本文介绍了如何在R中的两个日期之间进行汇总?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

下面是两个表

Table1
Date                   OldPrice   NewPrice
2014-06-12 09:32:56       0          10
2014-06-27 16:13:36       10         12
2014-08-12 22:41:47       12         13

Table2
Date                   Qty
2014-06-15 18:09:23     5
2014-06-19 12:04:29     4
2014-06-22 13:21:34     3
2014-06-29 19:01:22     6
2014-07-01 18:02:33     3
2014-09-29 22:41:47     6

我要以这种方式显示结果

I want to display the result in this manner

Date                   OldPrice   NewPrice    Qty
2014-06-12 09:32:56       0          10        0
2014-06-27 16:13:36       10         12        12
2014-08-12 22:41:47       12         13        15

我使用了命令

for(i in 1:nrow(Table1)){

  startDate = Table1$Date[i]
  endDate = Table1$Date[i+1]


 code=aggregate(list(Table2$Qty),
by=list(Table1$Date, Table1$OldPrice, Table1$NewPrice, Date = Table2$Date > startDate  & Table2$Date <= endDate), FUN=sum)

}

我希望数量在第一张表中的给定日期之间,即第一和第二个日期之间,第二和第三个日期之间,等等。同样,最后一个日期也需要汇总到另一个表中的日期为止。

I want the quantity to be aggregated between the given dates in first table, i.e between the first and second dates, second and third dates and so on. Also the last date needs to be aggregated till the end of date in the other table.

请先谢谢!

推荐答案

与您的上一个

这将为您提供确切的输出:

This gives you the exact output you provide:

#                 Date OldPrice NewPrice Quantity
#1 2014-06-12 09:32:56        0       10        0
#2 2014-06-27 16:13:36       10       12       12
#3 2014-08-12 22:41:47       12       13       15

由以下代码生成(请参见说明):

produced by the following code (see the explanation):

#your data & preps
df1 <- read.table(text=
                    "'Date'                   'OldPrice'   'NewPrice'
                  '2014-06-12 09:32:56'     '0'          '10'
                  '2014-06-27 16:13:36'     '10'         '12'
                  '2014-08-12 22:41:47'     '12'         '13'", stringsAsFactors=F,header=T)

df2 <- read.table(text=
                    "'Date'                  'Qty'
                  '2014-06-15 18:09:23'     '5'
                  '2014-06-19 12:04:29'     '4'
                  '2014-06-22 13:21:34'     '3'
                  '2014-06-29 19:01:22'     '6'
                  '2014-07-01 18:02:33'     '3'
                  '2014-09-29 22:41:47'     '6'" , stringsAsFactors=F, header=T)

df1$Date <- as.POSIXct(df1$Date); df2$Date <- as.POSIXct(df2$Date) #convert into datetime formats
df1 <- df1[with(df1, order(Date)),] #order df1 by Date
values <- vector("list", length = nrow(df1)+1) #declare a list of specific length of df1+1
out_of_time_dates_before <- c(); out_of_time_dates_after <- c() #here will be dates that come before or after dates available in df1
names(values) <- c(1:(length(values)-2), "out_of_time_dates_before", "out_of_time_dates_after")

#producing the main outputs
for(j in 1:nrow(df2)){
  print(paste0("Being processed: ", df2$Date[j]))
  for(i in 1:(nrow(df1)-1)){
    if(df2$Date[j]>df1$Date[i] & df2$Date[j]<df1$Date[i+1]){
      values[[i]] <- append(values[[i]], df2$Qty[j])
    } 
  } 
  if(df2$Date[j]<min(df1$Date)){
    out_of_time_dates_before <- append(out_of_time_dates_before, df2$Qty[j])
    values[["out_of_time_dates_before"]] <- append(values[["out_of_time_dates_before"]], df2$Qty[j])
  } else if(df2$Date[j] > max(df1$Date)){
    out_of_time_dates_after <- append(out_of_time_dates_after, df2$Qty[j])
    values[["out_of_time_dates_after"]] <- append(values[["out_of_time_dates_after"]], df2$Qty[j])
  }
}

#aggregating the quantity for the date ranges and all that falls before or after the date ranges not available in df1   
df1$Quantity <- c(0, sapply(values, sum)[1:(nrow(df1)-1)]) #replace the leading quantity value with 0 (as per your example)
df1$Quantity[1] <- df1$Quantity[1]+sapply(values, sum)["out_of_time_dates_before"]
df1$Quantity[length(df1$Quantity)] <- df1$Quantity[length(df1$Quantity)]+sapply(values, sum)["out_of_time_dates_after"]

我认为您遇到了一些有趣的问题,无法解决一些不幸的沟通方式它对SO社区。例如,尚不清楚该如何处理df1中任何可用范围之前的df2中的日期,因此在上面的代码中,我将这些数量(如果有)添加到了第一个日期。这与从df2中的日期范围之后的df2中的日期所期望的(将它们添加到df1的最后一个日期)中得到的期望更加通用和相似。

I think you got interesting problems to solve just a bit unfortunate way of communicating it to the SO community. For instance, it was not clear what to do with the dates in df2 that come before any available range in df1, hence in the code above, I add these quantities (if there are any) to the very first date. This is more generic and analogous to what you expected from the dates in df2 that come after ranges of dates in df1 (add them to the last date of df1).

这篇关于如何在R中的两个日期之间进行汇总?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆