R dplyr:基于行的条件拆分/应用/合并 [英] R dplyr: row-based conditions split/apply/combine

查看:157
本文介绍了R dplyr:基于行的条件拆分/应用/合并的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是此问题的dplyr版本

我有以下 data.table

initial.date <- as.POSIXct('2018-10-27 10:00:00',tz='GMT')
last.date <- as.POSIXct('2018-10-28 17:00:00',tz='GMT') 
    PriorityDateTime=seq.POSIXt(from=initial.date,to = last.date,by = '30 sec')
    TradePrice=seq(from=1, to=length(PriorityDateTime),by = 1)
    ndf<- data.frame(PriorityDateTime,TradePrice)
    ndf$InstrumentSymbol <- rep_len(x = c('asset1','asset2'),length.out = length(ndf$PriorityDateTime))
    ndf$id <- seq(1:length(x = ndf$InstrumentSymbol))
    ndf$datetime <- ymd_hms(ndf$PriorityDateTime)
    res <- ndf %>% data.table()

看起来像这样:

    > res
         PriorityDateTime TradePrice InstrumentSymbol   id            datetime
   1: 2018-10-27 10:00:00          1           asset1    1 2018-10-27 10:00:00
   2: 2018-10-27 10:00:30          2           asset2    2 2018-10-27 10:00:30
   3: 2018-10-27 10:01:00          3           asset1    3 2018-10-27 10:01:00
   4: 2018-10-27 10:01:30          4           asset2    4 2018-10-27 10:01:30
   5: 2018-10-27 10:02:00          5           asset1    5 2018-10-27 10:02:00

使用 dplyr 什么是最优雅,最快捷的方法:

Using dplyr what is the most elegant and fast way to:


  1. 拆分:对于每一行,定义其他具有 datetime 在过去或将来最多60秒(时间差小于60秒),并且与 InstrumentSymbol 相同

  2. 应用:在这些接近的行中,最接近该行的 TradePrice的 TradePrice [一世] :在原始 data.frame 和<$ c $中获得索引另一行的c> TradePrice

  3. 合并:将结果重新合并为原始 data.table 例如作为新列 index.minpricewithin60 minpricewithin60

  1. Split: For each line define the other lines that have a datetime at most 60 secs in the past or future (time difference less than 60secs), and have the same InstrumentSymbol as this line's.
  2. Apply: among these close lines, which one has the closest TradePrice to this line's TradePrice[i]: get the index in the original data.frame and the TradePrice of this other row
  3. Combine: recombine the results as new columns into the original data.table for example as new columns index.minpricewithin60 and minpricewithin60

示例结果:

> res
         PriorityDateTime TradePrice InstrumentSymbol   id            datetime minpricewithin60 index.minpricewithin60
   1: 2018-10-27 10:00:00          1           asset1    1 2018-10-27 10:00:00                2                      2
   2: 2018-10-27 10:00:30          2           asset2    2 2018-10-27 10:00:30                4                      4
   3: 2018-10-27 10:01:00          3           asset1    3 2018-10-27 10:01:00                1                      1
   4: 2018-10-27 10:01:30          4           asset2    4 2018-10-27 10:01:30                2                      2
   5: 2018-10-27 10:02:00          5           asset1    5 2018-10-27 10:02:00                3                      3

我想我的问题可以被问为如何在 dplyr 中修复行,其方式与 apply(df,1,function(x)df $ column-x [ column])
我有使用 dplyr 的潜在解决方案,但是到目前为止,一切都很缓慢。

I guess my problem can be asked as "how to fix a row in dplyr in a similar way to apply(df,1, function(x) df$column-x["column"]) I have potential solutions using dplyr but so far all were quite slow.

推荐答案

使用 dplyr 软件包和 lapply解决方案函数:

result_df <- do.call(rbind, lapply(1:nrow(res), function(row_id) {

             temp <-   res %>% filter(InstrumentSymbol == res$InstrumentSymbol[row_id]) %>% 
                       mutate(time_diff = abs(difftime(res$datetime[row_id], datetime, units = "secs")),
                              diff_price = abs(TradePrice - res$TradePrice[row_id])) %>% 
                       filter(id != res$id[row_id], time_diff <= 60) %>% 
                       filter(diff_price == min(diff_price)) %>% select(TradePrice, id) %>% 
                       rename(minpricewithin60 = TradePrice, index.minpricewithin60 = id)

             if(nrow(temp) == 0) temp[1,] <- c(NA, NA)

             return(bind_cols(res %>% slice(rep(row_id, nrow(temp))), temp))
                                                                  }))

head(result_df)

     PriorityDateTime TradePrice InstrumentSymbol id            datetime minpricewithin60 index.minpricewithin60
1 2018-10-27 10:00:00          1           asset1  1 2018-10-27 10:00:00                3                      3
2 2018-10-27 10:00:30          2           asset2  2 2018-10-27 10:00:30                4                      4
3 2018-10-27 10:01:00          3           asset1  3 2018-10-27 10:01:00                1                      1
4 2018-10-27 10:01:00          3           asset1  3 2018-10-27 10:01:00                5                      5
5 2018-10-27 10:01:30          4           asset2  4 2018-10-27 10:01:30                2                      2
6 2018-10-27 10:01:30          4           asset2  4 2018-10-27 10:01:30                6                      6

这篇关于R dplyr:基于行的条件拆分/应用/合并的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆