合并名称中具有相同后缀的多个列块 (R) [英] Coalescing multiple chunks of columns with the same suffix in names (R)

查看:21
本文介绍了合并名称中具有相同后缀的多个列块 (R)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含各种块"的数据集;具有不同前缀但后缀相同的列:

I have a dataset with various "chunks" of columns with different prefixes, but the same suffix:

<头>
IDA034B034C034D034A099B099A123B123...
1不适用1不适用不适用不适用31不适用...
22不适用不适用不适用2不适用不适用2...
3不适用不适用2不适用不适用21不适用...

每个块"中的列数也各不相同.有没有什么方法(除了手动,这是我一直在用 coalesce(!!! select(., contains("XXX"))) 煞费苦心地做的)来自动合并基于块的在共享后缀上?也就是说,结果应该类似于

The number of columns within each "chunk" also varies. Is there any way (other than manually, which is what I have been painstakingly doing with coalesce(!!! select(., contains("XXX")))) to automatically coalesce by chunk based on the shared suffix? That is, the result should resemble

<头>
ID034099123...
1131...
2222...
3221...

我不知道如何开始做这样的事情,所以任何建议都会非常有帮助.

I'm not sure how to begin doing something like this, so any suggestions would be very helpful.

推荐答案

我们使用 pivot_longer,然后我们按ID"分组并循环across其他列,应用na.omit 删除 NA 元素(我们假设每列每列只有一个非 NA 元素)

We reshape the data into 'long' format with pivot_longer, then we group by 'ID' and loop across the other columns, apply the na.omit to remove the NA elements (we assume that there is only one non-NA per each column by group)

library(dplyr)
library(tidyr)
df1 %>%
    pivot_longer(cols = -ID, names_to = ".value", 
           names_pattern = "[A-Z](\\d+)") %>% 
    group_by(ID) %>%
    summarise(across(everything(), na.omit), .groups = 'drop')

-输出

# A tibble: 3 x 4
     ID `034` `099` `123`
  <int> <int> <int> <int>
1     1     1     3     1
2     2     2     2     2
3     3     2     2     1


或者为了安全起见,使用 complete.cases 为非 NA 元素创建一个逻辑向量,并提取第一个元素(假设我们只需要一个非 NA - 如果非-NA长度不同,我们可能需要返回一个list)


Or to be safe, use complete.cases to create a logical vector for non-NA elements, and extract the first element (assuming we need only a single non-NA - if the non-NA lengths are different, we may need to return a list)

df1 %>%
     pivot_longer(cols = -ID, names_to = ".value",
          names_pattern = "[A-Z](\\d+)") %>%
     group_by(ID) %>% 
     summarise(across(everything(),  ~ .[complete.cases(.)][1]))

数据

df1 <- structure(list(ID = 1:3, A034 = c(NA, 2L, NA), B034 = c(1L, NA, 
NA), C034 = c(NA, NA, 2L), D034 = c(NA, NA, NA), A099 = c(NA, 
2L, NA), B099 = c(3L, NA, 2L), A123 = c(1L, NA, 1L), B123 = c(NA, 
2L, NA)), class = "data.frame", row.names = c(NA, -3L))

这篇关于合并名称中具有相同后缀的多个列块 (R)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆