动态确定数据框列是否存在,如果存在,则进行突变 [英] Dynamically determine if a dataframe column exists and mutate if it does
问题描述
我有一些代码可以根据客户端名称从数据库中提取和处理数据.某些客户端可能具有不包含特定列名的数据,例如 last_name
或 first_name
.对于不使用 last_name
或 first_name
的客户端,我不在乎.对于使用做的客户,我需要使用 toupper()
对这些列进行 mutate()
以便可以加入这些字段在ETL流程的后期标准化字段.
I have code that pulls and processes data from a database based upon a client name. Some clients may have data that does not include a specific column name, e.g., last_name
or first_name
. For clients that do not use last_name
or first_name
, I don't care. For clients that do use either of those fields, I need to mutate()
those columns with toupper()
so that I can join on those standardized fields later in the ETL process.
现在,我正在使用一系列的 if()
语句和一些辅助函数来查找数据帧的名称,然后对它们的名称进行更改(如果存在).我正在使用 if()
语句,因为 ifelse()
主要是矢量化的,不能很好地处理数据帧.
Right now, I'm using a series of if()
statements and some helper functions to look into the names of a dataframe then mutate if they exist. I'm using if()
statements because ifelse()
is mostly vectorized and doesn't handle dataframes well.
library(dplyr)
set.seed(256)
b <- data.frame(id = sample(1:100, 5, FALSE),
col_name = sample(1000:9999, 5, FALSE),
another_col = sample(1000:9999, 5, FALSE))
d <- data.frame(id = sample(1:100, 5, FALSE),
col_name = sample(1000:9999, 5, FALSE),
last_name = sample(letters, 5, FALSE))
mutate_first_last <- function(df){
mutate_first_name <- function(df){
df %>%
mutate(first_name = first_name %>% toupper())
}
mutate_last_name <- function(df){
df %>%
mutate(last_name = last_name %>% toupper())
}
n <- c("first_name", "last_name") %in% names(df)
if (n[1] & n[2]) return(df %>% mutate_first_name() %>% mutate_last_name())
if (n[1] & !n[2]) return(df %>% mutate_first_name())
if (!n[1] & n[2]) return(df %>% mutate_last_name())
if (!n[1] & !n[2]) return(df)
}
我能达到我期望的方式
> b %>% mutate_first_last()
id col_name another_col
1 48 8318 6207
2 39 7155 7170
3 16 4486 4321
4 55 2521 8024
5 15 1412 4875
> d %>% mutate_first_last()
id col_name last_name
1 64 7438 A
2 43 4551 Q
3 48 7401 K
4 78 3682 Z
5 87 2554 J
但这是处理此类任务的最佳方法吗?要动态查看是否在数据框中存在列名,然后将其更改(如果存在)?在此函数中必须具有多个 if()
语句似乎很奇怪.是否有更简化的方式来处理这些数据?
but is this the best way to handle this kind of task? To dynamically look to see if a column name exists in a dataframe then mutate it if it does? It seems strange to have to have multiple if()
statements in this function. Is there a more streamlined way to process these data?
推荐答案
您可以对 dplyr
中的 one_of
使用 mutate_at
.仅当列与 c("first_name","last_name")
之一匹配时,此列才会发生突变.如果不匹配,它将生成一个简单的警告,但您可以忽略或不显示它.
You can use mutate_at
with one_of
, both from dplyr
. This will mutate column only if it matches with one of c("first_name", "last_name")
. If no match, it will generate a simple warning but you can ignore or suppress it.
library(dplyr)
d %>%
mutate_at(vars(one_of(c("first_name", "last_name")), toupper)
id col_name last_name
1 19 7461 V
2 52 9651 H
3 56 1901 P
4 13 7866 Z
5 25 9527 U
# example with no match
b %>%
mutate_at(vars(one_of(c("first_name", "last_name"))), toupper)
id col_name another_col
1 34 9315 8686
2 26 5598 4124
3 17 3318 2182
4 32 1418 4369
5 49 4759 6680
Warning message:
Unknown variables: `first_name`, `last_name`
在 dplyr
-
这些功能使您可以根据变量的名称选择变量.
These functions allow you to select variables based on their names.
starts_with():以前缀开头
starts_with(): starts with a prefix
ends_with():以前缀结尾
ends_with(): ends with a prefix
contains():包含文字字符串
contains(): contains a literal string
matches():匹配正则表达式
matches(): matches a regular expression
num_range():一个数字范围,例如x01,x02,x03.
num_range(): a numerical range like x01, x02, x03.
one_of():字符向量中的变量.
one_of(): variables in character vector.
everything():所有变量.
everything(): all variables.
这篇关于动态确定数据框列是否存在,如果存在,则进行突变的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!