基于R中的模式合并列 [英] Coalesce columns based on pattern in R
问题描述
我在R中组合了数据集,并且每个数据集可能对同一数据使用不同的列名。我需要使用正则表达式来标识需要合并的列的名称,然后通过合并来运行该列名称列表。
I have combined data sets in R, and each data set may use a different column name for the same data. I need to use a regular expression to identify the names of the columns I need to combine, and then run that list of column names through coalesce.
我知道正确的正则表达式表达式来标识我的列,我知道如何将列名称手动写入合并函数以组合这些列,但是我不知道如何自动合并用正则表达式标识的列。
I know the proper regex expression to identify my columns, and I know how to manually write the column names into the coalesce function to combine these columns, but I do not know how to automatically coalesce columns identified with a regular expression.
sample = data.frame("PIDno" = c('a', NA, NA), "PINID" = c(NA, 'b', NA), "ParcelId" = c(NA, NA, 'c'))
PID_search = paste("sample$",grep("*PID*|*PIN*|*PARCEL*",colnames(sample),ignore.case = TRUE, value = TRUE),sep = "")
sample$PID_combine = coalesce(sample$'PIDno',
sample$'PINID',
sample$'ParcelId')
推荐答案
我们可以使用 tidyverse
。选定的列将被转换为字符
,其中 mutate_at
,然后是 coalesce
mutate
We can use tidyverse
. The selected columns are converted to character
with mutate_at
, then coalesce
those columns in mutate
library(tidyverse)
sample %>%
mutate_at(vars(matches("PID|PIN|Parcel")), as.character) %>%
mutate(new = coalesce(!!! select(., matches("PID|PIN|Parcel"))))
# PIDno PINID ParcelId new
#1 a <NA> <NA> a
#2 <NA> b <NA> b
#3 <NA> <NA> c c
这篇关于基于R中的模式合并列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!