删除特定值之前和之后的特定值的运行 [英] Delete runs of certain value before and after specific value

查看:35
本文介绍了删除特定值之前和之后的特定值的运行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含几列的数据框。基于活动列,我想删除特定值 pt的所有连续运行,但仅当它们在外部运行之前或之后立即发生时。

I have a data frame with several columns. Based on the column 'activity', I want to remove entire contiguous runs of a specific value, 'pt', but only when they occur immediately before or after a run of 'outside'.

在下面的简化数据中,有一次运行,其中活动为外部,并且前后都有 pt块。应当删除这两个 pt块。

In the simplified data below, there is one run where 'activity' is 'outside', and which have chunks of 'pt' before and after. These two 'pt' chunks should be removed.

   activity dist
1      home    1
2        pt    2 # <- run of 'pt' before run of 'outside': remove
3        pt    3 # <-
4        pt    4 # <- 
5   outside    5
6   outside    6
7        pt    7 # <- run of 'pt' after run of 'outside': remove
8        pt    8 # <-
9      work    9
10       pt   10
11       pt   11
12     home   12

因此,所需的输出是:

    activity dist 
 1      home    1 
 2   outside    5 
 3   outside    6 
 4      work    9 
 5        pt   10 
 6        pt   11 
 7      home   12 

这怎么可能

dput 数据:

structure(list(activity = c("home", "pt", "pt", "pt", "outside", "outside", "pt", "pt", "work", "pt", "pt", "home"),
              dist = 1:12),
          class = "data.frame", row.names = c(NA, -12L))


推荐答案

您可以使用 data.table 包中的一些便捷功能: rleid 到 [g]生成行程类型组id,然后 shift 来获取焦点索引在之前和之后的值

You may use some convenience functions from data.table package: rleid to "[g]enerate run-length type group id", and shift to get the values before and after the focal index in a vector.

library(data.table)
setDT(d)
d[ , r := rleid(activity)]

d[!(r %in% r[activity == "pt" & shift(activity, type = "lead") == "outside" |
               shift(activity) == "outside" & activity == "pt"])]

#    activity dist r
# 1:     home    1 1
# 2:  outside    5 3
# 3:  outside    6 3
# 4:     work    9 5
# 5:       pt   10 6
# 6:       pt   11 6
# 7:     home   12 7




说明:


强制使用 data.frame data.table setDT(d))。创建活动的运行长度索引( rleid )。检查当前值是否为'pt',下一个值是否为'外部'( activity == pt& shift(activity,type = lead)==外部 )或(<< c $ c> | ),如果当前值为'pt'而先前值为'outside'( activity ==" pt& shift(activity)==外部 )。


Explanation:

Coerce your data.frame to a data.table (setDT(d)). Create run length index of 'activity' (rleid). Check if current value is 'pt' and next value is 'outside' (activity == "pt" & shift(activity, type = "lead") == "outside"), or (|) if current value is 'pt' and previous value is 'outside' (activity == "pt" & shift(activity) == "outside").

此条件为 TRUE ,抓住要删除的运行组( r [< condition>] )。检查运行是否在要删除的组中( r%in%<要删除的运行组> )。如果是这样,在对数据建立索引时()不要保留这些行( d [< condition>]

Where this condition is TRUE, grab the run groups to be removed (r[<condition>]). Check if run are in the groups to be removed (r %in% <run groups to be removed>). If so, do not (!) keep these rows when indexing the data (d[<condition>])

在外部之前或之后的 pt运行值被替换为 NA 。将rle转换回向量( inverse.rle ),并删除具有 NA 的行( na.omit )。

The values of runs of 'pt' before or after 'outside' are replaced with NA. The rle is converted back to a vector (inverse.rle) and rows with NA are removed (na.omit).

很明显,如果原始数据中存在带有 NA 的行设置要保留的值,则需要使用另一个值进行替换。

Obviously, if there are rows with NA in the original data set which you want to keep, you need to use another value for replacement.

with(rle(d$activity),
     values[c(which(head(values, -1) == "pt" & tail(values, -1) == "outside"),
              which(head(values, -1) == "outside" & tail(values, -1) == "pt") + 1)]) <- NA

d$activity = inverse.rle(r)
na.omit(d)  

这篇关于删除特定值之前和之后的特定值的运行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆