连接表达式以对数据帧进行子集化 [英] Concatenate expressions to subset a dataframe

查看:31
本文介绍了连接表达式以对数据帧进行子集化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试创建一个函数来计算子集数据框中列的平均值.这里的技巧是,我总是希望有几个子集条件,然后可以选择将更多条件传递给函数以进一步子集数据帧.

I am attempting to create a function that will calculate the mean of a column in a subsetted dataframe. The trick here is that I always to want to have a couple subsetting conditions and then have the option to pass more conditions to the functions to further subset the dataframe.

假设我的数据如下所示:

Suppose my data look like this:

dat <- data.frame(var1 = rep(letters, 26), var2 = rep(letters, each = 26), var3 = runif(26^2))

head(dat)
  var1 var2      var3
1    a    a 0.7506109
2    b    a 0.7763748
3    c    a 0.6014976
4    d    a 0.6229010
5    e    a 0.5648263
6    f    a 0.5184999

我希望能够执行下面显示的子集,在所有函数调用中使用第一个条件,第二个条件可以随着每个函数调用而改变.此外,第二个子集条件可能是其他变量(我使用单个变量 var2 来简化,但条件可能涉及多个变量).

I want to be able to do the subset shown below, using the first condition in all function calls, and the second be something that can change with each function call. Additionally, the second subsetting condition could be on other variables (I'm using a single variable, var2, for parsimony, but the condition could involve multiple variables).

subset(dat, var1 %in% c('a', 'b', 'c') & var2 %in% c('a', 'b'))
   var1 var2      var3
1     a    a 0.7506109
2     b    a 0.7763748
3     c    a 0.6014976
27    a    b 0.7322357
28    b    b 0.4593551
29    c    b 0.2951004

我的示例函数和函数调用看起来像:

My example function and function call would look something like:

getMean <- function(expr) {  
  return(with(subset(dat, var1 %in% c('a', 'b', 'c') eval(expr)), mean(var3)))  
}
getMean(expression(& var2 %in% c('a', 'b')))

替代调用可能如下所示:

An alternative call could look like:

getMean(expression(& var4 < 6 & var5 > 10))

非常感谢任何帮助.

在 Wojciech Sobala 的帮助下,我想出了以下函数,它使我可以选择传入 0 个或多个条件.

With Wojciech Sobala's help, I came up with the following function, which gives me the option of passing in 0 or more conditions.

getMean <- function(expr = NULL) {
  sub <- if(is.null(expr)) { expression(var1 %in% c('a', 'b', 'c'))
  } else expression(var1 %in% c('a', 'b', 'c') & eval(expr))
  return(with(subset(dat, eval(sub)), mean(var3)))
}
getMean()
getMean(expression(var2 %in% c('a', 'b')))

推荐答案

可以使用 defalut expr=TRUE 进行简化.

It can be simplified with defalut expr=TRUE.

getMean <- function(expr = TRUE) {
  return(with(subset(dat, var1 %in% c('a', 'b', 'c') & eval(expr)), mean(var3)))
}

这篇关于连接表达式以对数据帧进行子集化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆