仅当列存在时才执行dplyr操作 [英] Execute dplyr operation only if column exists

查看:141
本文介绍了仅当列存在时才执行dplyr操作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

借鉴关于条件dplyr评估的讨论我想根据参考列是否有条件地在管道中执行步骤存在于传递的数据帧中.

Drawing on the discussion on conditional dplyr evaluation I would like conditionally execute a step in pipeline depending on whether the reference column exists in the passed data frame.

1) 2) 生成的结果应该相同.

# 1)
mtcars %>% 
  filter(am == 1) %>%
  filter(cyl == 4)

# 2)
mtcars %>%
  filter(am == 1) %>%
  {
    if("cyl" %in% names(.)) filter(cyl == 4) else .
  }

不可用列

# 1)
mtcars %>% 
  filter(am == 1)

# 2)    
mtcars %>%
  filter(am == 1) %>%
  {
    if("absent_column" %in% names(.)) filter(absent_column == 4) else .
  }

问题

对于可用列,传递的对象与初始数据帧不对应.原始代码返回错误消息:

Problem

For the available column the passed object does not correspond to the initial data frame. The original code returns the error message:

filter(cyl == 4)中的错误:找不到对象'cyl'

Error in filter(cyl == 4) : object 'cyl' not found

我尝试了其他语法(没有运气):

I have tried alternative syntax (with no luck):

>> mtcars %>%
...   filter(am == 1) %>%
...   {
...     if("cyl" %in% names(.)) filter(.$cyl == 4) else .
...   }
 Show Traceback

 Rerun with Debug
 Error in UseMethod("filter_") : 
  no applicable method for 'filter_' applied to an object of class "logical" 


跟进

我想扩展这个问题,以解释 filter 通话中 == 右侧的评估.例如,以下语法尝试根据第一个可用值进行过滤. mtcars%>%


Follow-up

I wanted to expand this question that would account for the evaluation on the right-hand side of the == in filter call. For instance the syntax below attempts to filter on the first available value. mtcars %>%

filter({
    if ("does_not_ex" %in% names(.))
      does_not_ex
    else
      NULL
  } == {
    if ("does_not_ex" %in% names(.))
      unique(.[['does_not_ex']])
    else
      NULL
  })

预期,该呼叫会评估为错误消息:

Expectedly, the call evaluates to an error message:

filter_impl(.data, quo)中的错误:结果的长度必须为32,而不是0

Error in filter_impl(.data, quo) : Result must have length 32, not 0

应用于现有列时:

mtcars %>%
  filter({
    if ("mpg" %in% names(.))
      mpg
    else
      NULL
  } == {
    if ("mpg" %in% names(.))
      unique(.[['mpg']])
    else
      NULL
  })

它与警告消息一起工作:

It works with a warning message:

  mpg cyl disp  hp drat   wt  qsec vs am gear carb
1  21   6  160 110  3.9 2.62 16.46  0  1    4    4

警告消息:在{中:较长的对象长度不是以下各项的倍数 物体长度更短

Warning message: In { : longer object length is not a multiple of shorter object length

后续问题

是否有一种巧妙的方法来扩展现有语法,以便在filter调用的右侧获得条件评估,从而理想地留在dplyr工作流程中?

Follow-up question

Is there a neat way of expending the existing syntax in order to get conditional evaluation on the right-hand side of the filter call, ideally staying within dplyr workflow?

推荐答案

由于此处作用域的工作方式,您无法从if语句中访问数据框.幸运的是,您不需要.

Because of the way the scopes here work, you cannot access the dataframe from within your if statement. Fortunately, you don't need to.

尝试:

mtcars %>%
  filter(am == 1) %>%
  filter({if("cyl" %in% names(.)) cyl else NULL} == 4)

在这里您可以在条件中使用'.'对象,以便检查该列是否存在,如果存在,则可以将该列返回给filter函数.

Here you can use the '.' object within the conditional so you can check if the column exists and, if it exists, you can return the column to the filter function.

根据问题的docendo discimus评论,您可以访问数据框,但不能隐式访问-即,您必须使用.

as per docendo discimus' comment on the question, you can access the dataframe but not implicitly - i.e. you have to specifically reference it with .

这篇关于仅当列存在时才执行dplyr操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆