删除特定值之前和之后的特定值的运行 [英] Delete runs of certain value before and after specific value
问题描述
我有一个包含几列的数据框。基于活动列,我想删除特定值 pt的所有连续运行,但仅当它们在外部运行之前或之后立即发生时。
I have a data frame with several columns. Based on the column 'activity', I want to remove entire contiguous runs of a specific value, 'pt', but only when they occur immediately before or after a run of 'outside'.
在下面的简化数据中,有一次运行,其中活动为外部,并且前后都有 pt块。应当删除这两个 pt块。
In the simplified data below, there is one run where 'activity' is 'outside', and which have chunks of 'pt' before and after. These two 'pt' chunks should be removed.
activity dist
1 home 1
2 pt 2 # <- run of 'pt' before run of 'outside': remove
3 pt 3 # <-
4 pt 4 # <-
5 outside 5
6 outside 6
7 pt 7 # <- run of 'pt' after run of 'outside': remove
8 pt 8 # <-
9 work 9
10 pt 10
11 pt 11
12 home 12
因此,所需的输出是:
activity dist
1 home 1
2 outside 5
3 outside 6
4 work 9
5 pt 10
6 pt 11
7 home 12
这怎么可能
dput
数据:
structure(list(activity = c("home", "pt", "pt", "pt", "outside", "outside", "pt", "pt", "work", "pt", "pt", "home"),
dist = 1:12),
class = "data.frame", row.names = c(NA, -12L))
推荐答案
您可以使用 data.table
包中的一些便捷功能: rleid
到 [g]生成行程类型组id,然后 shift
来获取焦点索引在之前和之后的值
You may use some convenience functions from data.table
package: rleid
to "[g]enerate run-length type group id", and shift
to get the values before and after the focal index in a vector.
library(data.table)
setDT(d)
d[ , r := rleid(activity)]
d[!(r %in% r[activity == "pt" & shift(activity, type = "lead") == "outside" |
shift(activity) == "outside" & activity == "pt"])]
# activity dist r
# 1: home 1 1
# 2: outside 5 3
# 3: outside 6 3
# 4: work 9 5
# 5: pt 10 6
# 6: pt 11 6
# 7: home 12 7
说明:
强制使用 data.frame
到 data.table
( setDT(d)
)。创建活动的运行长度索引( rleid
)。检查当前值是否为'pt',下一个值是否为'外部'( activity == pt& shift(activity,type = lead)==外部
)或(<< c $ c> | ),如果当前值为'pt'而先前值为'outside'( activity ==" pt& shift(activity)==外部
)。
Explanation:
Coerce your data.frame
to a data.table
(setDT(d)
). Create run length index of 'activity' (rleid
). Check if current value is 'pt' and next value is 'outside' (activity == "pt" & shift(activity, type = "lead") == "outside"
), or (|
) if current value is 'pt' and previous value is 'outside' (activity == "pt" & shift(activity) == "outside"
).
此条件为 TRUE
,抓住要删除的运行组( r [< condition>]
)。检查运行是否在要删除的组中( r%in%<要删除的运行组>
)。如果是这样,在对数据建立索引时(!
)不要保留这些行( d [< condition>]
)
Where this condition is TRUE
, grab the run groups to be removed (r[<condition>]
). Check if run are in the groups to be removed (r %in% <run groups to be removed>
). If so, do not (!
) keep these rows when indexing the data (d[<condition>]
)
在外部之前或之后的 pt运行值被替换为 NA
。将rle转换回向量( inverse.rle
),并删除具有 NA
的行( na.omit
)。
The values of runs of 'pt' before or after 'outside' are replaced with NA
. The rle is converted back to a vector (inverse.rle
) and rows with NA
are removed (na.omit
).
很明显,如果原始数据中存在带有 NA
的行设置要保留的值,则需要使用另一个值进行替换。
Obviously, if there are rows with NA
in the original data set which you want to keep, you need to use another value for replacement.
with(rle(d$activity),
values[c(which(head(values, -1) == "pt" & tail(values, -1) == "outside"),
which(head(values, -1) == "outside" & tail(values, -1) == "pt") + 1)]) <- NA
d$activity = inverse.rle(r)
na.omit(d)
这篇关于删除特定值之前和之后的特定值的运行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!