基于因子级别删除行 [英] Remove rows based on factor-levels

查看：121 发布时间：2017/3/12 11:13:40 r data.table subset r-factor

本文介绍了基于因子级别删除行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个格式为long的data.frame df 。

I have a data.frame df in format "long".

df <- data.frame(site = rep(c("A","B","C"), 1, 7),
                 time = c(11,11,11,22,22,22,33),
                 value = ceiling(rnorm(7)*10))
df <- df[order(df$site), ]

df
  site time value
1    A   11    12
2    A   22   -24
3    A   33   -30
4    B   11     3
5    B   22    16
6    C   11     3
7    C   22     9

问题

如何删除 df $ time 的唯一元素不存在于<$ c的每个级别的行$ c> df $ site ？

Question

How do I remove the rows where an unique element of df$time is not present for each of the levels of df$site ?

在这种情况下，我想删除 df [3，] ，因为对于 df $ time ，时间戳33仅存在于站点A，而不存在于站点B和站点C。

In this case I want to remove df[3,], because for df$time the timestamp 33 is only present for site A and not for site B and site C.

所需输出：

df.trimmed
  site time value
1    A   11    12
2    A   22   -24
4    B   11     3
5    B   22    16
6    C   11     3
7    C   22     9

data.frame容易有800k行和200k个唯一的时间戳。我不想使用循环，但我不知道如何使用矢量化函数 apply（）或 lapply（）

The data.frame has easily 800k rows and 200k unique timestamps. I don't want to use loops but I don't know how to use vectorized functions like apply() or lapply() for this case.

推荐答案

这里是另一个可能的解决方案使用 data.table package：

Here's another possible solution using the data.table package:

unTime <- unique(df$time)

library(data.table)

DT <- data.table(df, key = "site")

(notInAll <- unique(DT[, list(ans = which(!unTime %in% time)), by = key(DT)]$ans))
# [1] 3

DT[time %in% unTime[-notInAll]]

#      site time value
# [1,]    A   11     3
# [2,]    A   22    11
# [3,]    B   11    -6
# [4,]    B   22    -2
# [5,]    C   11   -19
# [6,]    C   22   -14

来自Matthew的编辑

很好。或者更直接的方式：

EDIT from Matthew
Nice. Or a slightly more direct way :

DT = as.data.table(df)
tt = DT[,length(unique(site)),by=time]
tt
   time V1
1:   11  3
2:   22  3
3:   33  1

tt = tt[V1==max(V1)]      # See * below
tt
   time V1
1:   11  3
2:   22  3

DT[time %in% tt$time]
   site time value
1:    A   11     7
2:    A   22    -2
3:    B   11     8
4:    B   22   -10
5:    C   11     3
6:    C   22     1

如果所有网站都没有时间，当最终结果应该为空（如Ben在评论中指出），标记为 * 可以是：

In case no time is present in all sites, when final result should be empty (as Ben pointed out in comments), the step marked * above could be :

tt = tt[V1==length(unique(DT$site))]

这篇关于基于因子级别删除行的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

基于因子级别删除行 [英] Remove rows based on factor-levels

问题描述

问题

Question

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

基于因子级别删除行 [英] Remove rows based on factor-levels

问题描述

问题

Question

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭