根据列标题名称的匹配开头组合R中的列 [英] Combining columns in R based on matching beginnings of column title names
问题描述
我有一个看起来像下面的数据框. A1U_sweet实际上是实际数据帧中的第19列,而C1U_sweet是实际数据帧中的第39列.有20列以A ##开头和20列以C ##开头.
I have a dataframe that looks somewhat like the following. A1U_sweet is actually the 19th column in the real dataframe, and C1U_sweet is the 39th column in the real dataframe. There are 20 columns beginning with A## and 20 beginning with C##.
A1U_sweet A2F_dip A3U_bbq C1U_sweet C2F_dip C3U_bbq
1 2 1 NA NA NA
NA NA NA 4 1 2
2 4 7 NA NA NA
我想添加结合A值和C值的其他列.结果数据框将包含类似于B1U_sweet和B2F_dip的列.
I would like to make additional columns that combine the A values and the C values. The resulting dataframe would include columns looking like B1U_sweet and B2F_dip.
A1U_sweet A2F_dip A3U_bbq C1U_sweet C2F_dip C3U_bbq B1U_sweet B2F_dip
1 2 1 NA NA NA 1 2
NA NA NA 4 1 2 4 1
2 4 7 NA NA NA 2 4
有人建议我尝试以下代码.前两行有效,但在实施其余两行后,我收到一条错误消息.
Someone proposed I try the following code. The first two lines work, but after implementing the rest, I get an error message.
types <- grep('^A([0-9]|[12][0-9])[A-Z]_[a-z]+', names(df)) ## Get all "A"
patterns
types <- substr(types, 2, Inf) ## Remove the "A"
for (tp in types) {
aa <- df[[paste0('A', tp)]] ## "A" column
cc <- df[[paste0('C', tp)]] ## "C" column
df[[paste0('B', tp)]] <- ifelse(is.na(aa), aa, cc)
}
这是错误消息:
Error in `[[<-.data.frame`(`*tmp*`, paste0("B", tp), value = logical(0)) :
replacement has 0 rows, data has 94
In addition: Warning message:
In is.na(aa) : is.na() applied to non-(list or vector) of type 'NULL'
数据确实有94列,但我不明白为什么这可能触发此错误.我将不胜感激任何帮助使此代码正常运行!
The data does have 94 columns, but I don't see why that might be triggering this error. I'd appreciate any helping making this code run properly!
这是我到目前为止所做的.我必须进入并手动更改要合并的每组列的列名.一定有更好的方法!
This is what I've been doing so far. I have to go in and manually change the column names for each set of columns I want to combine. There has to be a better way!
df$B1U_sweetnsour<-A1U_sweetnsour
df$B1U_sweetnsour[is.na(df$B1U_sweetnsour)]<- C1U_sweetnsour[is.na(A1U_sweetnsour)]
推荐答案
考虑mapply
逐元素比较 A 列和 C 列并分配所有 B 列.并使用与gsub
不同的sub
,sub
仅在列标题中其他位置有A的情况下替换第一次出现的情况.
Consider mapply
to compare A columns and C columns elementwise and assign all B columns at once. And use sub
which unlike gsub
, sub
only replaces first occurrence in case there are A's elsewhere in column header.
new_B_cols <- sub("A", "B", names(df)[grep("^A", names(df))])
replace_na <- function(aa, cc) {
aa[is.na(aa)] <- cc[is.na(aa)]
return(aa)
}
df[new_B_cols] <- mapply(replace_na, df[grep("^A", names(df))], df[grep("^C", names(df))])
df[order(names(df))]
# A1U_sweet A2F_dip A3U_bbq B1U_sweet B2F_dip B3U_bbq C1U_sweet C2F_dip C3U_bbq
# 1 1 2 1 1 2 1 NA NA NA
# 2 NA NA NA 4 1 2 4 1 2
# 3 2 4 7 2 4 7 NA NA NA
这篇关于根据列标题名称的匹配开头组合R中的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!