根据一列排列数据帧，消除不必要的响应 [英] arrange dataframe based on one column eliminating the unwanted responses

查看：56 发布时间：2020/10/17 0:10:32 r dataframe dplyr rstudio sequence

本文介绍了根据一列排列数据帧，消除不必要的响应的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有此数据

 日期信号
 1 2009-01-13 09:55:00 4645.00 4838.931 5358.883 Buy2 
 2 2009-01-14 09:55:00 4767.50 4718.254 5336.703 Buy1 
 3 2009-01-15 09:55:00 4485.00 4653.316 5274.384 Buy2 
 4 2009-01-16 09:55:00 4580.00 4537.693 5141.435 Buy1 
 5 2009-01-19 09:55:00 4532.00 4548.088 4891.041 Buy2 
 6 2009-01-27 09:55:00 4190.00 4183.503 4548.497 Buy1 
 7 2009-01-30 09:55:00 4436.00 4155.236 4377.907 Sell1 
 8 2009-02-02 09:55:00 4217.00 4152.626 4390.802 Sell2 
 9 2009-02-09 09:55:00 4469.00 4203.437 4376.277 Sell1 
 10 2009-02-12 09:55:00 4469.90 4220.845 4503.798 Sell2 
 11 2009-02-13 09:55:00 4553.00 4261.980 4529.777 Sell1 
 12 2009-02 -16 09:55:00 4347.20 4319.656 4564.387 Sell2 
 13 2009-02-17 09:55:00 4161.05 4371.474 4548.912 Buy2 
 14 2009-02-27 09:55:00 3875.55 3862.085 4101.929 Buy1 
 15 2009-03-02 09:55:00 3636.00 3846.423 4036.020 Buy2 
 16 2009-03-12 09:55:00 3420.00 3372.665 3734.949 Buy1 
 17 2009-03-13 09 ：55：00 3656.00 3372.100 3605.357 Sell1 
 18 2009-03-17 09:55:00 3650.00 3360.421 3663.322 Sell2 
 19 2009-03-18 09:55:00 3721.00 3363.735 3682.293 Sell1 
 20 2009-03-20 09:55:00 3687.00 3440.651 3784.778 Sell2

，并且必须将其安排在这种形式

  2 2009-01-14 09:55:00 4767.50 4718.254 5336.703 Buy1 
 7 2009-01- 30 09:55:00 4436.00 4155.236 4377.907 Sell1 
 8 2009-02-02 09:55:00 4217.00 4152.626 4390.802 Sell2 
 13 2009-02-17 09:55:00 4161.05 4371.474 4548.912 Buy2 
 14 2009-02-27 09:55:00 3875.55 3862.085 4101.929 Buy1 
 17 2009-03-13 09:55:00 3656.00 3372.100 3605.357 Sell1 
 18 2009-03-17 09:55： 00 3650.00 3360.421 3663.322 Sell2

因此数据按以下顺序排列Buy1 Sell1 Sell2 Buy2并消除中间观察值。
我已经尝试了几个dplyr：filter命令，但是没有一个给出期望的输出。

解决方案

如果我很了解您的问题，以下代码可以解决。改编自此讨论。

这个想法是将序列定义为模式：

 模式<-c（ Buy1， Sell1， Sell2， Buy2）

然后在您的列中找到该模式的位置：

  library（zoo）
 pos< -which（rollapply（data $ signal，4，same，pattern，fill = FALSE，align = left））

并提取模式位置之后的行：

 行<-unlist（lapply（pos，函数（x，n）seq（x，x + n-1），4））
 data_filtered<-data [rows]

Voilà。

编辑

<因为我误解了您的问题，所以这里有一个新的解决方案。
您想在列中检索序列 Buy1， Sell1， Sell2， Buy2，并消除不适合该序列的观察值。我没有看到简单的矢量化解决方案，因此这里有一个循环来解决这个问题。根据数据的大小，您可能想在RCPP中实现类似的算法或以某种方式对其进行矢量化。

 序列<-c（ Buy1， Sell1， Sell2， Buy2）
保持<-逻辑（length（data（signal）））
 
s<-0 
 for（i in seq（1，length（data $ signal）））{
 if（sequence [s +1] == data $ signal [i]）{
 keep [i ]<-T 
s<-（（s + 1）%% 4 
}否则{
 keep [i]<-F 
} 
} 
 
 data_filtered<-data [keep，]

告诉我这项工作更好。
如果有人有矢量解决方案，我会很好奇。

I have this data

       date                                           signal 
1   2009-01-13 09:55:00  4645.00  4838.931  5358.883  Buy2
2   2009-01-14 09:55:00  4767.50  4718.254  5336.703  Buy1
3   2009-01-15 09:55:00  4485.00  4653.316  5274.384  Buy2
4   2009-01-16 09:55:00  4580.00  4537.693  5141.435  Buy1
5   2009-01-19 09:55:00  4532.00  4548.088  4891.041  Buy2
6   2009-01-27 09:55:00  4190.00  4183.503  4548.497  Buy1
7   2009-01-30 09:55:00  4436.00  4155.236  4377.907 Sell1
8   2009-02-02 09:55:00  4217.00  4152.626  4390.802 Sell2
9   2009-02-09 09:55:00  4469.00  4203.437  4376.277 Sell1
10  2009-02-12 09:55:00  4469.90  4220.845  4503.798 Sell2
11  2009-02-13 09:55:00  4553.00  4261.980  4529.777 Sell1
12  2009-02-16 09:55:00  4347.20  4319.656  4564.387 Sell2
13  2009-02-17 09:55:00  4161.05  4371.474  4548.912  Buy2
14  2009-02-27 09:55:00  3875.55  3862.085  4101.929  Buy1
15  2009-03-02 09:55:00  3636.00  3846.423  4036.020  Buy2
16  2009-03-12 09:55:00  3420.00  3372.665  3734.949  Buy1
17  2009-03-13 09:55:00  3656.00  3372.100  3605.357 Sell1
18  2009-03-17 09:55:00  3650.00  3360.421  3663.322 Sell2
19  2009-03-18 09:55:00  3721.00  3363.735  3682.293 Sell1
20  2009-03-20 09:55:00  3687.00  3440.651  3784.778 Sell2

and have to arrange it in this form

2   2009-01-14 09:55:00  4767.50  4718.254  5336.703  Buy1
7   2009-01-30 09:55:00  4436.00  4155.236  4377.907 Sell1
8   2009-02-02 09:55:00  4217.00  4152.626  4390.802 Sell2
13  2009-02-17 09:55:00  4161.05  4371.474  4548.912  Buy2
14  2009-02-27 09:55:00  3875.55  3862.085  4101.929  Buy1
17  2009-03-13 09:55:00  3656.00  3372.100  3605.357 Sell1
18  2009-03-17 09:55:00  3650.00  3360.421  3663.322 Sell2

So that data is arranged in order of Buy1 Sell1 Sell2 Buy2 and eliminating the middle observations. I have tried several dplyr:filter commands but none is giving the desired output.

解决方案

If I have well understood your problem, the following code should solve it. It is adapted from this discussion.

The idea is to define your sequence as a pattern:

pattern <- c("Buy1", "Sell1", "Sell2", "Buy2")

Then find the position of this pattern in your column:

library(zoo)
 pos <- which(rollapply(data$signal, 4, identical, pattern, fill = FALSE, align = "left"))

and extract the rows following the position of your patterns:

rows <- unlist(lapply(pos, function(x, n) seq(x, x+n-1), 4))
data_filtered <- data[rows,]

Voilà.

EDIT

Since I had misunderstood your problem, here is a new solution. You want to retrieve the sequence "Buy1", "Sell1", "Sell2", "Buy2" in your column, and eliminate the observations that do not fit in this sequence. I do not see a trivial vectorised solution, so here is a loop to solve that. Depending on the size of your data, you may want to implement a similar algorithm in RCPP or vectorise it in some ways.

sequence <- c("Buy1", "Sell1", "Sell2", "Buy2")
keep <- logical(length(data$signal))

s <- 0
for (i in seq(1, length(data$signal))){
    if (sequence[s +1] == data$signal[i]){
        keep[i] <- T
        s <- (s + 1) %% 4
    } else {
        keep[i] <- F
    }
}

data_filtered <- data[keep,]

Tell me if this work better. If anyone has a vectorised solution, I would be curious to see it.

这篇关于根据一列排列数据帧，消除不必要的响应的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

根据一列排列数据帧，消除不必要的响应 [英] arrange dataframe based on one column eliminating the unwanted responses

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

根据一列排列数据帧，消除不必要的响应 [英] arrange dataframe based on one column eliminating the unwanted responses

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭