如何找到单调序列,并考虑序列达到最大值后重新启动 [英] How to find monotonous sequence along with taking into account sequence restart on reaching the maximum

查看:74
本文介绍了如何找到单调序列,并考虑序列达到最大值后重新启动的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据表说dt

I have a data.table say dt

name <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v")
score <- c(42, 82, 43, 32,47,48, 49, 50, 54, 59, 76, 09, 13, 88, 91, 99, 04, 06, 08, 12, 14, 15)
class <- c("c1", "c1", "c1", "c1","c1", "c1", "c1", "c2", "c2", "c2", "c3", "c3", "c3", "c3","c3", "c3", "c3", "c3", "c3", "c3", "c3" ,"c3")
dt <- data.table(name, score, class)

它看起来像:

> dt
    name score class
 1:    a    42    c1
 2:    b    82    c1
 3:    c    43    c1
 4:    d    32    c1
 5:    e    47    c1
 6:    f    48    c1
 7:    g    49    c1
 8:    h    50    c2
 9:    i    54    c2
10:    j    59    c2
11:    k    76    c3
12:    l     9    c3
13:    m    13    c3
14:    n    88    c3
15:    o    91    c3
16:    p    99    c3
17:    q     4    c3
18:    r     6    c3
19:    s     8    c3
20:    t    12    c3
21:    u    14    c3
22:    v    15    c3

我只需要那些遵循单调得分顺序的记录类。在这种情况下,只有类别c1的分数为42,43,47,48 49的记录,类别c2的分数为50、54、59的记录。

I only require those records which follow a monotonous sequence of the score for each class. in this case only records with score 42, 43,47,48 49 for class c1, records with score 50, 54, 59 for class c2.

在类别 c3中得分为76、88、91、99、04、06、08、12、14、15的记录。这里的序列已达到最大值(99),然后重新开始。 c3类中的得分09和13超出了单调序列,因此需要删除。

In class "c3" records with score 76,88,91,99,04,06,08,12, 14, 15. Here the sequence have reached the maximum(99) and then have restarted. Scores 09 and 13 in class "c3" were out of the monotonous sequence hence needed to be removed.

我想删除那些对于c1,c2,c3类中的得分未按顺序排列的记录。总共有1百万条记录。

I want to remove those records where score mentioned are not in sequence for each of the class c1, c2, c3. There are in total 1 million records.

给定类别最多可以有3个连续的乱序得分。

There can be at maximum 3 consecutive out of sequence scores for a given class.

最终输出必须看起来像。

the final output must look like.

> dt
    name score class
 1:    a    42    c1
 2:    c    43    c1
 3:    e    47    c1
 4:    f    48    c1
 5:    g    49    c1
 6:    h    50    c2
 7:    i    54    c2
 8:    j    59    c2
 9:    k    76    c3
10:    n    88    c3
11:    o    91    c3
12:    p    99    c3
13:    q     4    c3
14:    r     6    c3
15:    s     8    c3
16:    t    12    c3
17:    u    14    c3
18:    v    15    c3

在为了找到我尝试过的单调序列:

In order to find monotonous sequence I have tried:

dt <- dt[, .SD[score == cummax(score)],class]

,但这也删除了达到最大值后重新启动的序列。

but this is also removing the sequence which are restarting after reaching the maximum value. How to do this.

推荐答案

cummax 的想法非常好-您只需要进行一些修改即可:

The cummax idea is very good - you just need some modifications:

dt[, keep := score >= cummax(shift(score, fill = first(score))), 
     by = .(class, rleid(score == 99))]

或者,也许更好的方法是

Or, perhaps a better approach would be

dt[dt[, .I[score == cummax(score)], by = list(class, rleid(score == 99))]$V1]

这篇关于如何找到单调序列,并考虑序列达到最大值后重新启动的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆