R dplyr：基于行的条件拆分/应用/合并 [英] R dplyr: row-based conditions split/apply/combine

查看：157 发布时间：2020/10/26 3:18:20 r dplyr

本文介绍了R dplyr：基于行的条件拆分/应用/合并的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有以下 data.table

initial.date <- as.POSIXct('2018-10-27 10:00:00',tz='GMT')
last.date <- as.POSIXct('2018-10-28 17:00:00',tz='GMT') 
    PriorityDateTime=seq.POSIXt(from=initial.date,to = last.date,by = '30 sec')
    TradePrice=seq(from=1, to=length(PriorityDateTime),by = 1)
    ndf<- data.frame(PriorityDateTime,TradePrice)
    ndf$InstrumentSymbol <- rep_len(x = c('asset1','asset2'),length.out = length(ndf$PriorityDateTime))
    ndf$id <- seq(1:length(x = ndf$InstrumentSymbol))
    ndf$datetime <- ymd_hms(ndf$PriorityDateTime)
    res <- ndf %>% data.table()

看起来像这样：

    > res
         PriorityDateTime TradePrice InstrumentSymbol   id            datetime
   1: 2018-10-27 10:00:00          1           asset1    1 2018-10-27 10:00:00
   2: 2018-10-27 10:00:30          2           asset2    2 2018-10-27 10:00:30
   3: 2018-10-27 10:01:00          3           asset1    3 2018-10-27 10:01:00
   4: 2018-10-27 10:01:30          4           asset2    4 2018-10-27 10:01:30
   5: 2018-10-27 10:02:00          5           asset1    5 2018-10-27 10:02:00

使用 dplyr 什么是最优雅，最快捷的方法：

Using dplyr what is the most elegant and fast way to:

拆分：对于每一行，定义其他具有 datetime 在过去或将来最多60秒（时间差小于60秒），并且与 InstrumentSymbol 相同

应用：在这些接近的行中，最接近该行的 TradePrice的 TradePrice [一世] ：在原始 data.frame 和<$ c $中获得索引另一行的c> TradePrice

合并：将结果重新合并为原始 data.table 例如作为新列 index.minpricewithin60 和 minpricewithin60



Split: For each line define the other lines that have a datetime at most 60 secs in the past or future (time difference less than 60secs), and have the same InstrumentSymbol as this line's.
Apply: among these close lines, which one has the closest TradePrice to this line's TradePrice[i]: get the index in the original data.frame and the TradePrice of this other row 
Combine: recombine the results as new columns into the original data.table for example as new columns index.minpricewithin60 and minpricewithin60

示例结果：
> res
         PriorityDateTime TradePrice InstrumentSymbol   id            datetime minpricewithin60 index.minpricewithin60
   1: 2018-10-27 10:00:00          1           asset1    1 2018-10-27 10:00:00                2                      2
   2: 2018-10-27 10:00:30          2           asset2    2 2018-10-27 10:00:30                4                      4
   3: 2018-10-27 10:01:00          3           asset1    3 2018-10-27 10:01:00                1                      1
   4: 2018-10-27 10:01:30          4           asset2    4 2018-10-27 10:01:30                2                      2
   5: 2018-10-27 10:02:00          5           asset1    5 2018-10-27 10:02:00                3                      3

我想我的问题可以被问为如何在 dplyr 中修复行，其方式与 apply（df，1，function（x）df $ column-x [ column]） 
我有使用 dplyr 的潜在解决方案，但是到目前为止，一切都很缓慢。 
I guess my problem can be asked as "how to fix a row in dplyr in a similar way to apply(df,1, function(x) df$column-x["column"])
I have potential solutions using dplyr but so far all were quite slow. 
推荐答案
使用 dplyr 软件包和 lapply解决方案函数：
result_df <- do.call(rbind, lapply(1:nrow(res), function(row_id) {

             temp <-   res %>% filter(InstrumentSymbol == res$InstrumentSymbol[row_id]) %>% 
                       mutate(time_diff = abs(difftime(res$datetime[row_id], datetime, units = "secs")),
                              diff_price = abs(TradePrice - res$TradePrice[row_id])) %>% 
                       filter(id != res$id[row_id], time_diff <= 60) %>% 
                       filter(diff_price == min(diff_price)) %>% select(TradePrice, id) %>% 
                       rename(minpricewithin60 = TradePrice, index.minpricewithin60 = id)

             if(nrow(temp) == 0) temp[1,] <- c(NA, NA)

             return(bind_cols(res %>% slice(rep(row_id, nrow(temp))), temp))
                                                                  }))

head(result_df)

     PriorityDateTime TradePrice InstrumentSymbol id            datetime minpricewithin60 index.minpricewithin60
1 2018-10-27 10:00:00          1           asset1  1 2018-10-27 10:00:00                3                      3
2 2018-10-27 10:00:30          2           asset2  2 2018-10-27 10:00:30                4                      4
3 2018-10-27 10:01:00          3           asset1  3 2018-10-27 10:01:00                1                      1
4 2018-10-27 10:01:00          3           asset1  3 2018-10-27 10:01:00                5                      5
5 2018-10-27 10:01:30          4           asset2  4 2018-10-27 10:01:30                2                      2
6 2018-10-27 10:01:30          4           asset2  4 2018-10-27 10:01:30                6                      6


                        这篇关于R dplyr：基于行的条件拆分/应用/合并的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

R dplyr：基于行的条件拆分/应用/合并 [英] R dplyr: row-based conditions split/apply/combine

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

R dplyr：基于行的条件拆分/应用/合并 [英] R dplyr: row-based conditions split/apply/combine

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭