根据列标题名称的匹配开头组合R中的列 [英] Combining columns in R based on matching beginnings of column title names

查看:85
本文介绍了根据列标题名称的匹配开头组合R中的列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个看起来像下面的数据框. A1U_sweet实际上是实际数据帧中的第19列,而C1U_sweet是实际数据帧中的第39列.有20列以A ##开头和20列以C ##开头.

I have a dataframe that looks somewhat like the following. A1U_sweet is actually the 19th column in the real dataframe, and C1U_sweet is the 39th column in the real dataframe. There are 20 columns beginning with A## and 20 beginning with C##.

A1U_sweet  A2F_dip  A3U_bbq  C1U_sweet  C2F_dip  C3U_bbq
1          2        1        NA         NA       NA
NA         NA       NA       4          1        2
2          4        7        NA         NA       NA

我想添加结合A值和C值的其他列.结果数据框将包含类似于B1U_sweet和B2F_dip的列.

I would like to make additional columns that combine the A values and the C values. The resulting dataframe would include columns looking like B1U_sweet and B2F_dip.

A1U_sweet  A2F_dip  A3U_bbq  C1U_sweet  C2F_dip  C3U_bbq  B1U_sweet  B2F_dip
1          2        1        NA         NA       NA       1          2
NA         NA       NA       4          1        2        4          1
2          4        7        NA         NA       NA       2          4

有人建议我尝试以下代码.前两行有效,但在实施其余两行后,我收到一条错误消息.

Someone proposed I try the following code. The first two lines work, but after implementing the rest, I get an error message.

types <- grep('^A([0-9]|[12][0-9])[A-Z]_[a-z]+', names(df)) ## Get all "A" 
patterns
types <- substr(types, 2, Inf) ## Remove the "A"
for (tp in types) {
  aa <- df[[paste0('A', tp)]] ## "A" column
  cc <- df[[paste0('C', tp)]] ## "C" column
  df[[paste0('B', tp)]] <- ifelse(is.na(aa), aa, cc)
}

这是错误消息:

Error in `[[<-.data.frame`(`*tmp*`, paste0("B", tp), value = logical(0)) : 
  replacement has 0 rows, data has 94
In addition: Warning message:
In is.na(aa) : is.na() applied to non-(list or vector) of type 'NULL'

数据确实有94列,但我不明白为什么这可能触发此错误.我将不胜感激任何帮助使此代码正常运行!

The data does have 94 columns, but I don't see why that might be triggering this error. I'd appreciate any helping making this code run properly!

这是我到目前为止所做的.我必须进入并手动更改要合并的每组列的列名.一定有更好的方法!

This is what I've been doing so far. I have to go in and manually change the column names for each set of columns I want to combine. There has to be a better way!

df$B1U_sweetnsour<-A1U_sweetnsour
df$B1U_sweetnsour[is.na(df$B1U_sweetnsour)]<- C1U_sweetnsour[is.na(A1U_sweetnsour)]

推荐答案

考虑mapply逐元素比较 A 列和 C 列并分配所有 B 列.并使用与gsub不同的subsub仅在列标题中其他位置有A的情况下替换第一次出现的情况.

Consider mapply to compare A columns and C columns elementwise and assign all B columns at once. And use sub which unlike gsub, sub only replaces first occurrence in case there are A's elsewhere in column header.

new_B_cols <- sub("A", "B", names(df)[grep("^A", names(df))])

replace_na <- function(aa, cc) {
     aa[is.na(aa)] <- cc[is.na(aa)]
     return(aa) 
}

df[new_B_cols] <- mapply(replace_na, df[grep("^A", names(df))], df[grep("^C", names(df))])

df[order(names(df))]
#   A1U_sweet A2F_dip A3U_bbq B1U_sweet B2F_dip B3U_bbq C1U_sweet C2F_dip C3U_bbq
# 1         1       2       1         1       2       1        NA      NA      NA
# 2        NA      NA      NA         4       1       2         4       1       2
# 3         2       4       7         2       4       7        NA      NA      NA

这篇关于根据列标题名称的匹配开头组合R中的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆