根据跨另一个（摘要）数据帧中几列的键为数据帧设置子集 [英] Subsetting a data frame based on key spanning several columns in another (summary) data frame

查看：42 发布时间：2020/10/16 23:07:10 r dataframe subset

本文介绍了根据跨另一个（摘要）数据帧中几列的键为数据帧设置子集的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个数据框 a ，其中有4个标识列： A，B，C，D 。使用 ddply（）创建的第二个数据框 b 包含不同<$ c的所有值的摘要每一组 A，B，C 中的$ c> D s。第三个数据框 c 包含 b 的子集，该子集具有我要从 a 。

I have a data frame a with 4 identifying columns: A, B, C, D. A second data frame b, created with ddply(), contains a summary of all the values for different Ds for every set of A,B,C. A third data frame c contains a subset of b with bad values that I want to delete from a.

因此，我希望从 a 中提取一个子集，由 A，B，C 组合标识的行，也出现在 c 中。我可以想出一个循环（丑陋且效率低下）的方法，但是，我的DBA背景鼓励我寻求一个更……直接的解决方案。

Thus, I want a subset from a, omitting all the rows identified by a combination of A,B,C that are also present in c. I can think of ways do this (ugly and inefficiently) in a loop, but, my DBA background encourages me to seek a solution that is a little bit more … direct.

在代码中：

a <- data.frame(
  A=rep(c('2013-10-30', '2014-11-6'), each=16*20),
  B=rep(1:8, each=2*20),
  C=rep(1:4, each=20),
  D=1:20
)

a$Val=rnorm(nrow(a))

library(plyr)
b <- ddply(a, ~B+C+A, summarise,
           mean_Val=mean(Val))

# Some subset criteria based on AOI group values
c <- subset(b, mean_Val <= 0)

# EDIT: Delete all the rows from a for which the
# key-triplets A,B,C are present in c
for (i in 1:nrow(c)) {
  c_row = c[i,]
  a <- a[ which( !(a$A==c_row$A & a$B==c_row$B & a$C==c_row$C) ), ]
}
# This is the loopy type of 'solution' I didn't want to use

请随时在我的问题中解决不确定性。如果您可以指出正确的方向，我将很乐意进行编辑。

Please feel free also to address unclarities in my question. I'd be happy to edit if you can point me in the right direction.

更新

或者我们可以使用 base R 选项和 interaction 选项将感兴趣的列粘贴到两个数据集中，并检查是否使用％in％将第二个（'c'）的元素放在第一个（'a'）中。逻辑索引可用于子集 a。

Update

Or we could use a base R option with interaction to paste the columns of interest together in both datasets and check whether the elements of 2nd ('c') are in 1st ('a') using %in%. The logical index can be used to subset "a".

 a1 <- a[!(as.character(interaction(a[1:3], sep=".")) %in% 
          as.character(interaction(c[LETTERS[1:3]], sep="."))),]

或者就像@David Arenburg提到的那样，我们可能不需要创建 b 或 c 数据集以获取预期的输出。使用 plyr ，在中使用 mutate 和<$ c $创建新的均值列（ mean_Val） c>子集均值大于0的行（ mean_Val> 0 ）

Or as @David Arenburg mentioned, we may not need to create b, or c datasets to get the expected output. Using plyr, create a new mean column ("mean_Val") in "a" with mutate and subset the rows with mean greater than 0 (mean_Val >0)

 library(plyr)
 subset(ddply(a, ~B+C+A, mutate, mean_Val=mean(Val)), mean_Val>0)

或使用 dplyr

 library(dplyr)
  a %>%
     group_by(B, C, A) %>%
     mutate(mean_Val=mean(Val)) %>% 
     filter(mean_Val>0)

或者，如果我们不需要均值值作为 a中的列，则从 base R ave

Or if we don't need the "mean" values as a column in "a", ave from base R could be used as well.

  a[!!with(a, ave(Val, B, C, A, FUN=function(x) mean(x)>0)),]

如果需要保持 mean_Val 列（@David Arenburg提出的变体）

If we need to keep the mean_Val column (a variation proposed by @David Arenburg)

  subset(transform(a, Mean_Val = ave(Val, B, C, A, FUN = mean)),
                 Mean_Val > 0)

数据

data

set.seed(24)
a <- data.frame(A= sample(LETTERS[1:3], 20, replace=TRUE), 
   B=sample(LETTERS[1:3], 20, replace=TRUE), C=sample(LETTERS[1:3], 
         20, replace=TRUE), D=rnorm(20))

b <- a %>% 
       group_by(A, B, C) %>% 
       summarise(D=sum(D))
set.seed(39)
c1 <- b[sample(1:nrow(b), 6, replace=FALSE),]

这篇关于根据跨另一个（摘要）数据帧中几列的键为数据帧设置子集的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

根据跨另一个（摘要）数据帧中几列的键为数据帧设置子集 [英] Subsetting a data frame based on key spanning several columns in another (summary) data frame

问题描述

推荐答案

更新

Update

数据

data

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

根据跨另一个（摘要）数据帧中几列的键为数据帧设置子集 [英] Subsetting a data frame based on key spanning several columns in another (summary) data frame

问题描述

推荐答案

更新

Update

数据

data

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭