dplyr-基于列名称相似性的变异公式 [英] dplyr - mutate formula based on similarities in column names

查看:84
本文介绍了dplyr-基于列名称相似性的变异公式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图根据列名称的一部分,找到一种在列组合上运行mutate()的更好方法.

I am trying to find a better way to run a mutate() on a combination of columns based on parts of the column names.

例如,以下代码中简化mutate函数的方法:

For example, a way to simplify the mutate function in the following code:

df <- data.frame(LIMITED_A = c(100,200),
                UNLIMITED_A = c(25000,50000),
                LIMITED_B = c(300,300),
                UNLIMITED_B = c(500,500),
                LIMITED_C = c(2,10),
                UNLIMITED_C = c(5,20))

df %>%
  mutate(FINAL_LIMITED = (LIMITED_A - LIMITED_B) / LIMITED_C,
         FINAL_UNLIMITED = (UNLIMITED_A - UNLIMITED_B) / UNLIMITED_C)

具有以下格式的公式:(._A - ._B) / ._C,结果的名称为FINAL_.

A formula with the form: (._A - ._B) / ._C and the result is given the name FINAL_.

mutate函数中是否可以将其简化为一行代码?

Is there a way to simplify this to a single line of code in the mutate function?

推荐答案

这是另一种方法:

library(dplyr)
library(rlang)
library(glue)

dynamic_mutate = function(DF,  
                          col_names = gsub("(.*)_\\w+$", "\\1", names(DF)), 
                          expression = "({x}_A - {x}_B)/{x}_C",
                          prefix = "FINAL"){

  name_list = col_names %>% 
    unique() %>%
    as.list()

  expr_list = name_list %>%
    lapply(function(x) parse_quosure(glue(expression))) %>% 
    setNames(paste(prefix, name_list, sep = "_")) 

  DF %>% mutate(!!!expr_list)

}

结果:

> df %>%
+   dynamic_mutate()
  LIMITED_A UNLIMITED_A LIMITED_B UNLIMITED_B LIMITED_C UNLIMITED_C FINAL_LIMITED
1       100       25000       300         500         2           5          -100
2       200       50000       300         500        10          20           -10
  FINAL_UNLIMITED
1            4900
2            2475

> df %>%
+   dynamic_mutate(c("LIMITED", "UNLIMITED"), prefix = "NEW")
  LIMITED_A UNLIMITED_A LIMITED_B UNLIMITED_B LIMITED_C UNLIMITED_C NEW_LIMITED
1       100       25000       300         500         2           5        -100
2       200       50000       300         500        10          20         -10
  NEW_UNLIMITED
1          4900
2          2475

> df %>%
+   dynamic_mutate(c("UNLIMITED"), prefix = "NEW")
  LIMITED_A UNLIMITED_A LIMITED_B UNLIMITED_B LIMITED_C UNLIMITED_C NEW_UNLIMITED
1       100       25000       300         500         2           5          4900
2       200       50000       300         500        10          20          2475

> df %>% 
+   dynamic_mutate(c("A", "B", "C"), "LIMITED_{x} + UNLIMITED_{x}")
  LIMITED_A UNLIMITED_A LIMITED_B UNLIMITED_B LIMITED_C UNLIMITED_C FINAL_A FINAL_B FINAL_C
1       100       25000       300         500         2           5   25100     800       7
2       200       50000       300         500        10          20   50200     800      30

注释:

此方法使用lapplyglue从使用gsub提取的前缀构造表达式(或者您可以提供自己的前缀/后缀).然后使用rlang中的parse_quosure将表达式解析为quosure.结果,expr_listquosure的命名列表,然后我可以使用!!!取消引号并将参数拼接为mutate中的单独表达式.

This approach uses lapply and glue to construct expressions from prefixes extracted using gsub (or you can supply your own prefixes/suffixes). parse_quosure from rlang is then used to parse the expression into a quosure. As a result, expr_list is a named list of quosure's which I can then use !!! to unquote and splice the arguments into separate expressions in mutate.

您可以通过调整expression参数来更改公式,如上一个示例所示.

You can change the formula by adjusting the expression argument as shown in the last example.

此方法的优点是速度非常快,因为我主要是操纵列名和创建字符串(表达式).缺点是它使用多个程序包.

The advantage of this method is that it is quite fast because I am mainly manipulating column names and creating strings (expressions). The disadvantage is that it uses multiple packages.

这篇关于dplyr-基于列名称相似性的变异公式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆