dplyr更改/替换行子集上的几列 [英] dplyr mutate/replace several columns on a subset of rows

查看:69
本文介绍了dplyr更改/替换行子集上的几列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试一个基于dplyr的工作流(而不是主要使用data.table(我过去经常使用)),但是遇到了一个我找不到等效项的问题dplyr解决方案。我通常遇到需要根据一个条件有条件地更新/替换几列的情况。这是我的data.table解决方案的一些示例代码:

I'm in the process of trying out a dplyr-based workflow (rather than using mostly data.table, which I'm used to), and I've come across a problem that I can't find an equivalent dplyr solution to. I commonly run into the scenario where I need to conditionally update/replace several columns based on a single condition. Here's some example code, with my data.table solution:

library(data.table)

# Create some sample data
set.seed(1)
dt <- data.table(site = sample(1:6, 50, replace=T),
                 space = sample(1:4, 50, replace=T),
                 measure = sample(c('cfl', 'led', 'linear', 'exit'), 50, 
                               replace=T),
                 qty = round(runif(50) * 30),
                 qty.exit = 0,
                 delta.watts = sample(10.5:100.5, 50, replace=T),
                 cf = runif(50))

# Replace the values of several columns for rows where measure is "exit"
dt <- dt[measure == 'exit', 
         `:=`(qty.exit = qty,
              cf = 0,
              delta.watts = 13)]

有没有简单的dplyr解决方案来解决同一问题?我想避免使用ifelse,因为我不想多次键入条件-这是一个简化的示例,但是有时基于一个条件会有很多分配。

Is there a simple dplyr solution to this same problem? I'd like to avoid using ifelse because I don't want to have to type the condition multiple times - this is a simplified example, but there are sometimes many assignments based on a single condition.

预先感谢您的帮助!

推荐答案

这些解决方案( 1)维护管道,(2)覆盖输入,(3)仅要求条件指定一次:

These solutions (1) maintain the pipeline, (2) do not overwrite the input and (3) only require that the condition be specified once:

1a)mutate_cond 为可以合并到管道中的数据帧或数据表创建一个简单的函数。此函数类似于 mutate ,但仅对满足条件的行起作用:

1a) mutate_cond Create a simple function for data frames or data tables that can be incorporated into pipelines. This function is like mutate but only acts on the rows satisfying the condition:

mutate_cond <- function(.data, condition, ..., envir = parent.frame()) {
  condition <- eval(substitute(condition), .data, envir)
  .data[condition, ] <- .data[condition, ] %>% mutate(...)
  .data
}

DF %>% mutate_cond(measure == 'exit', qty.exit = qty, cf = 0, delta.watts = 13)

1b)mutate_last 这是数据框或数据表的替代功能,再次类似于 mutate ,但仅在<$ c内使用$ c> group_by (如下例所示),并且仅在最后一个组而不是每个组上运行。请注意,如果为TRUE> FALSE,则如果 group_by 指定一个条件,则 mutate_last 将仅对满足该条件的行进行操作。

1b) mutate_last This is an alternative function for data frames or data tables which again is like mutate but is only used within group_by (as in the example below) and only operates on the last group rather than every group. Note that TRUE > FALSE so if group_by specifies a condition then mutate_last will only operate on rows satisfying that condition.

mutate_last <- function(.data, ...) {
  n <- n_groups(.data)
  indices <- attr(.data, "indices")[[n]] + 1
  .data[indices, ] <- .data[indices, ] %>% mutate(...)
  .data
}


DF %>% 
   group_by(is.exit = measure == 'exit') %>%
   mutate_last(qty.exit = qty, cf = 0, delta.watts = 13) %>%
   ungroup() %>%
   select(-is.exit)

2)分解条件分解条件,使其成为多余的列,稍后将其删除。然后使用 ifelse replace 或带有逻辑的算术运算,如图所示。

2) factor out condition Factor out the condition by making it an extra column which is later removed. Then use ifelse, replace or arithmetic with logicals as illustrated. This also works for data tables.

library(dplyr)

DF %>% mutate(is.exit = measure == 'exit',
              qty.exit = ifelse(is.exit, qty, qty.exit),
              cf = (!is.exit) * cf,
              delta.watts = replace(delta.watts, is.exit, 13)) %>%
       select(-is.exit)

3)sqldf 我们可以通过管道中的sqldf包使用SQL update 来存储数据帧(但不是数据表,除非我们将它们转换-这可能表示dplyr中的错误。请参见 dplyr问题1579 )。似乎由于 update 的存在,我们不希望修改此代码中的输入,但是实际上 update 对临时生成的数据库中的输入副本起作用,而不对实际的输入起作用。

3) sqldf We could use SQL update via the sqldf package in the pipeline for data frames (but not data tables unless we convert them -- this may represent a bug in dplyr. See dplyr issue 1579). It may seem that we are undesirably modifying the input in this code due to the existence of the update but in fact the update is acting on a copy of the input in the temporarily generated database and not on the actual input.

library(sqldf)

DF %>% 
   do(sqldf(c("update '.' 
                 set 'qty.exit' = qty, cf = 0, 'delta.watts' = 13 
                 where measure = 'exit'", 
              "select * from '.'")))

4)row_case_when 还要查看在
row_case_when .com / questions / 61837438 / returning-tibble-how-to-vector-with-case-when / 61853888?noredirect = 1#comment109407490_61853888>返回小标题:如何使用case_when进行矢量化?。它使用类似于 case_when 的语法,但适用于行。

4) row_case_when Also check out row_case_when defined in Returning a tibble: how to vectorize with case_when? . It uses a syntax similar to case_when but applies to rows.

library(dplyr)

DF %>%
  row_case_when(
    measure == "exit" ~ data.frame(qty.exit = qty, cf = 0, delta.watts = 13),
    TRUE ~ data.frame(qty.exit, cf, delta.watts)
  )

注1:我们将其用作 DF

set.seed(1)
DF <- data.frame(site = sample(1:6, 50, replace=T),
                 space = sample(1:4, 50, replace=T),
                 measure = sample(c('cfl', 'led', 'linear', 'exit'), 50, 
                               replace=T),
                 qty = round(runif(50) * 30),
                 qty.exit = 0,
                 delta.watts = sample(10.5:100.5, 50, replace=T),
                 cf = runif(50))

注2:问题如何轻松指定更新行的子集是lso在dplyr问题中讨论了 134 631 1518 1573 631 是主线程, 1573 正在查看此处的答案。

Note 2: The problem of how to easily specify updating a subset of rows is also discussed in dplyr issues 134, 631, 1518 and 1573 with 631 being the main thread and 1573 being a review of the answers here.

这篇关于dplyr更改/替换行子集上的几列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆