根据一列排列数据帧,消除不必要的响应 [英] arrange dataframe based on one column eliminating the unwanted responses

查看:56
本文介绍了根据一列排列数据帧,消除不必要的响应的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有此数据

 日期信号
1 2009-01-13 09:55:00 4645.00 4838.931 5358.883 Buy2
2 2009-01-14 09:55:00 4767.50 4718.254 5336.703 Buy1
3 2009-01-15 09:55:00 4485.00 4653.316 5274.384 Buy2
4 2009-01-16 09:55:00 4580.00 4537.693 5141.435 Buy1
5 2009-01-19 09:55:00 4532.00 4548.088 4891.041 Buy2
6 2009-01-27 09:55:00 4190.00 4183.503 4548.497 Buy1
7 2009-01-30 09:55:00 4436.00 4155.236 4377.907 Sell1
8 2009-02-02 09:55:00 4217.00 4152.626 4390.802 Sell2
9 2009-02-09 09:55:00 4469.00 4203.437 4376.277 Sell1
10 2009-02-12 09:55:00 4469.90 4220.845 4503.798 Sell2
11 2009-02-13 09:55:00 4553.00 4261.980 4529.777 Sell1
12 2009-02 -16 09:55:00 4347.20 4319.656 4564.387 Sell2
13 2009-02-17 09:55:00 4161.05 4371.474 4548.912 Buy2
14 2009-02-27 09:55:00 3875.55 3862.085 4101.929 Buy1
15 2009-03-02 09:55:00 3636.00 3846.423 4036.020 Buy2
16 2009-03-12 09:55:00 3420.00 3372.665 3734.949 Buy1
17 2009-03-13 09 :55:00 3656.00 3372.100 3605.357 Sell1
18 2009-03-17 09:55:00 3650.00 3360.421 3663.322 Sell2
19 2009-03-18 09:55:00 3721.00 3363.735 3682.293 Sell1
20 2009-03-20 09:55:00 3687.00 3440.651 3784.778 Sell2

,并且必须将其安排在这种形式

  2 2009-01-14 09:55:00 4767.50 4718.254 5336.703 Buy1 
7 2009-01- 30 09:55:00 4436.00 4155.236 4377.907 Sell1
8 2009-02-02 09:55:00 4217.00 4152.626 4390.802 Sell2
13 2009-02-17 09:55:00 4161.05 4371.474 4548.912 Buy2
14 2009-02-27 09:55:00 3875.55 3862.085 4101.929 Buy1
17 2009-03-13 09:55:00 3656.00 3372.100 3605.357 Sell1
18 2009-03-17 09:55: 00 3650.00 3360.421 3663.322 Sell2

因此数据按以下顺序排列Buy1 Sell1 Sell2 Buy2并消除中间观察值。
我已经尝试了几个dplyr:filter命令,但是没有一个给出期望的输出。

解决方案

如果我很了解您的问题,以下代码可以解决。改编自此讨论



这个想法是将序列定义为模式:

 模式<-c( Buy1, Sell1, Sell2, Buy2)

然后在您的列中找到该模式的位置:

  library(zoo)
pos< -which(rollapply(data $ signal,4,same,pattern,fill = FALSE,align = left))

并提取模式位置之后的行:

 行<-unlist(lapply(pos,函数(x,n)seq(x,x + n-1),4))
data_filtered<-data [rows]

Voilà。



编辑



<因为我误解了您的问题,所以这里有一个新的解决方案。
您想在列中检索序列 Buy1, Sell1, Sell2, Buy2,并消除不适合该序列的观察值。我没有看到简单的矢量化解决方案,因此这里有一个循环来解决这个问题。根据数据的大小,您可能想在RCPP中实现类似的算法或以某种方式对其进行矢量化。

 序列<-c( Buy1, Sell1, Sell2, Buy2)
保持<-逻辑(length(data(signal)))

s<-0
for(i in seq(1,length(data $ signal))){
if(sequence [s +1] == data $ signal [i]){
keep [i ]<-T
s<-((s + 1)%% 4
}否则{
keep [i]<-F
}
}

data_filtered<-data [keep,]

告诉我这项工作更好。
如果有人有矢量解决方案,我会很好奇。


I have this data

       date                                           signal 
1   2009-01-13 09:55:00  4645.00  4838.931  5358.883  Buy2
2   2009-01-14 09:55:00  4767.50  4718.254  5336.703  Buy1
3   2009-01-15 09:55:00  4485.00  4653.316  5274.384  Buy2
4   2009-01-16 09:55:00  4580.00  4537.693  5141.435  Buy1
5   2009-01-19 09:55:00  4532.00  4548.088  4891.041  Buy2
6   2009-01-27 09:55:00  4190.00  4183.503  4548.497  Buy1
7   2009-01-30 09:55:00  4436.00  4155.236  4377.907 Sell1
8   2009-02-02 09:55:00  4217.00  4152.626  4390.802 Sell2
9   2009-02-09 09:55:00  4469.00  4203.437  4376.277 Sell1
10  2009-02-12 09:55:00  4469.90  4220.845  4503.798 Sell2
11  2009-02-13 09:55:00  4553.00  4261.980  4529.777 Sell1
12  2009-02-16 09:55:00  4347.20  4319.656  4564.387 Sell2
13  2009-02-17 09:55:00  4161.05  4371.474  4548.912  Buy2
14  2009-02-27 09:55:00  3875.55  3862.085  4101.929  Buy1
15  2009-03-02 09:55:00  3636.00  3846.423  4036.020  Buy2
16  2009-03-12 09:55:00  3420.00  3372.665  3734.949  Buy1
17  2009-03-13 09:55:00  3656.00  3372.100  3605.357 Sell1
18  2009-03-17 09:55:00  3650.00  3360.421  3663.322 Sell2
19  2009-03-18 09:55:00  3721.00  3363.735  3682.293 Sell1
20  2009-03-20 09:55:00  3687.00  3440.651  3784.778 Sell2

and have to arrange it in this form

2   2009-01-14 09:55:00  4767.50  4718.254  5336.703  Buy1
7   2009-01-30 09:55:00  4436.00  4155.236  4377.907 Sell1
8   2009-02-02 09:55:00  4217.00  4152.626  4390.802 Sell2
13  2009-02-17 09:55:00  4161.05  4371.474  4548.912  Buy2
14  2009-02-27 09:55:00  3875.55  3862.085  4101.929  Buy1
17  2009-03-13 09:55:00  3656.00  3372.100  3605.357 Sell1
18  2009-03-17 09:55:00  3650.00  3360.421  3663.322 Sell2

So that data is arranged in order of Buy1 Sell1 Sell2 Buy2 and eliminating the middle observations. I have tried several dplyr:filter commands but none is giving the desired output.

解决方案

If I have well understood your problem, the following code should solve it. It is adapted from this discussion.

The idea is to define your sequence as a pattern:

pattern <- c("Buy1", "Sell1", "Sell2", "Buy2")

Then find the position of this pattern in your column:

library(zoo)
 pos <- which(rollapply(data$signal, 4, identical, pattern, fill = FALSE, align = "left")) 

and extract the rows following the position of your patterns:

rows <- unlist(lapply(pos, function(x, n) seq(x, x+n-1), 4))
data_filtered <- data[rows,]

Voilà.

EDIT

Since I had misunderstood your problem, here is a new solution. You want to retrieve the sequence "Buy1", "Sell1", "Sell2", "Buy2" in your column, and eliminate the observations that do not fit in this sequence. I do not see a trivial vectorised solution, so here is a loop to solve that. Depending on the size of your data, you may want to implement a similar algorithm in RCPP or vectorise it in some ways.

sequence <- c("Buy1", "Sell1", "Sell2", "Buy2")
keep <- logical(length(data$signal))

s <- 0
for (i in seq(1, length(data$signal))){
    if (sequence[s +1] == data$signal[i]){
        keep[i] <- T
        s <- (s + 1) %% 4
    } else {
        keep[i] <- F
    }
}

data_filtered <- data[keep,]

Tell me if this work better. If anyone has a vectorised solution, I would be curious to see it.

这篇关于根据一列排列数据帧,消除不必要的响应的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆