基于因子级别删除行 [英] Remove rows based on factor-levels
问题描述
我有一个格式为long的data.frame df
。
I have a data.frame df
in format "long".
df <- data.frame(site = rep(c("A","B","C"), 1, 7),
time = c(11,11,11,22,22,22,33),
value = ceiling(rnorm(7)*10))
df <- df[order(df$site), ]
df
site time value
1 A 11 12
2 A 22 -24
3 A 33 -30
4 B 11 3
5 B 22 16
6 C 11 3
7 C 22 9
问题
如何删除 df $ time
的唯一元素不存在于<$ c的每个级别的行$ c> df $ site ?
Question
How do I remove the rows where an unique element of df$time
is not present for each of the levels of df$site
?
在这种情况下,我想删除 df [3,]
,因为对于 df $ time
,时间戳33仅存在于站点A,而不存在于站点B和站点C。
In this case I want to remove df[3,]
, because for df$time
the timestamp 33 is only present for site A and not for site B and site C.
所需输出:
df.trimmed
site time value
1 A 11 12
2 A 22 -24
4 B 11 3
5 B 22 16
6 C 11 3
7 C 22 9
data.frame容易有800k行和200k个唯一的时间戳。我不想使用循环,但我不知道如何使用矢量化函数 apply()
或 lapply()
The data.frame has easily 800k rows and 200k unique timestamps. I don't want to use loops but I don't know how to use vectorized functions like apply()
or lapply()
for this case.
推荐答案
这里是另一个可能的解决方案使用 data.table
package:
Here's another possible solution using the data.table
package:
unTime <- unique(df$time)
library(data.table)
DT <- data.table(df, key = "site")
(notInAll <- unique(DT[, list(ans = which(!unTime %in% time)), by = key(DT)]$ans))
# [1] 3
DT[time %in% unTime[-notInAll]]
# site time value
# [1,] A 11 3
# [2,] A 22 11
# [3,] B 11 -6
# [4,] B 22 -2
# [5,] C 11 -19
# [6,] C 22 -14
来自Matthew的编辑
很好。或者更直接的方式:
EDIT from Matthew
Nice. Or a slightly more direct way :
DT = as.data.table(df)
tt = DT[,length(unique(site)),by=time]
tt
time V1
1: 11 3
2: 22 3
3: 33 1
tt = tt[V1==max(V1)] # See * below
tt
time V1
1: 11 3
2: 22 3
DT[time %in% tt$time]
site time value
1: A 11 7
2: A 22 -2
3: B 11 8
4: B 22 -10
5: C 11 3
6: C 22 1
如果所有网站都没有时间,当最终结果应该为空(如Ben在评论中指出),标记为 *
可以是:
In case no time is present in all sites, when final result should be empty (as Ben pointed out in comments), the step marked *
above could be :
tt = tt[V1==length(unique(DT$site))]
这篇关于基于因子级别删除行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!