dplyr-基于列名称相似性的变异公式 [英] dplyr - mutate formula based on similarities in column names
问题描述
我试图根据列名称的一部分,找到一种在列组合上运行mutate()
的更好方法.
I am trying to find a better way to run a mutate()
on a combination of columns based on parts of the column names.
例如,以下代码中简化mutate
函数的方法:
For example, a way to simplify the mutate
function in the following code:
df <- data.frame(LIMITED_A = c(100,200),
UNLIMITED_A = c(25000,50000),
LIMITED_B = c(300,300),
UNLIMITED_B = c(500,500),
LIMITED_C = c(2,10),
UNLIMITED_C = c(5,20))
df %>%
mutate(FINAL_LIMITED = (LIMITED_A - LIMITED_B) / LIMITED_C,
FINAL_UNLIMITED = (UNLIMITED_A - UNLIMITED_B) / UNLIMITED_C)
具有以下格式的公式:(._A - ._B) / ._C
,结果的名称为FINAL_.
A formula with the form: (._A - ._B) / ._C
and the result is given the name FINAL_.
在mutate
函数中是否可以将其简化为一行代码?
Is there a way to simplify this to a single line of code in the mutate
function?
推荐答案
这是另一种方法:
library(dplyr)
library(rlang)
library(glue)
dynamic_mutate = function(DF,
col_names = gsub("(.*)_\\w+$", "\\1", names(DF)),
expression = "({x}_A - {x}_B)/{x}_C",
prefix = "FINAL"){
name_list = col_names %>%
unique() %>%
as.list()
expr_list = name_list %>%
lapply(function(x) parse_quosure(glue(expression))) %>%
setNames(paste(prefix, name_list, sep = "_"))
DF %>% mutate(!!!expr_list)
}
结果:
> df %>%
+ dynamic_mutate()
LIMITED_A UNLIMITED_A LIMITED_B UNLIMITED_B LIMITED_C UNLIMITED_C FINAL_LIMITED
1 100 25000 300 500 2 5 -100
2 200 50000 300 500 10 20 -10
FINAL_UNLIMITED
1 4900
2 2475
> df %>%
+ dynamic_mutate(c("LIMITED", "UNLIMITED"), prefix = "NEW")
LIMITED_A UNLIMITED_A LIMITED_B UNLIMITED_B LIMITED_C UNLIMITED_C NEW_LIMITED
1 100 25000 300 500 2 5 -100
2 200 50000 300 500 10 20 -10
NEW_UNLIMITED
1 4900
2 2475
> df %>%
+ dynamic_mutate(c("UNLIMITED"), prefix = "NEW")
LIMITED_A UNLIMITED_A LIMITED_B UNLIMITED_B LIMITED_C UNLIMITED_C NEW_UNLIMITED
1 100 25000 300 500 2 5 4900
2 200 50000 300 500 10 20 2475
> df %>%
+ dynamic_mutate(c("A", "B", "C"), "LIMITED_{x} + UNLIMITED_{x}")
LIMITED_A UNLIMITED_A LIMITED_B UNLIMITED_B LIMITED_C UNLIMITED_C FINAL_A FINAL_B FINAL_C
1 100 25000 300 500 2 5 25100 800 7
2 200 50000 300 500 10 20 50200 800 30
注释:
此方法使用lapply
和glue
从使用gsub
提取的前缀构造表达式(或者您可以提供自己的前缀/后缀).然后使用rlang
中的parse_quosure
将表达式解析为quosure
.结果,expr_list
是quosure
的命名列表,然后我可以使用!!!
取消引号并将参数拼接为mutate
中的单独表达式.
This approach uses lapply
and glue
to construct expressions from prefixes extracted using gsub
(or you can supply your own prefixes/suffixes). parse_quosure
from rlang
is then used to parse the expression into a quosure
. As a result, expr_list
is a named list of quosure
's which I can then use !!!
to unquote and splice the arguments into separate expressions in mutate
.
您可以通过调整expression
参数来更改公式,如上一个示例所示.
You can change the formula by adjusting the expression
argument as shown in the last example.
此方法的优点是速度非常快,因为我主要是操纵列名和创建字符串(表达式).缺点是它使用多个程序包.
The advantage of this method is that it is quite fast because I am mainly manipulating column names and creating strings (expressions). The disadvantage is that it uses multiple packages.
这篇关于dplyr-基于列名称相似性的变异公式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!