合并名称中具有相同后缀的多个列块 (R) [英] Coalescing multiple chunks of columns with the same suffix in names (R)

查看：21 发布时间：2021/9/7 19:38:28 r dplyr tidyverse

本文介绍了合并名称中具有相同后缀的多个列块 (R)的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个包含各种块"的数据集；具有不同前缀但后缀相同的列:

I have a dataset with various "chunks" of columns with different prefixes, but the same suffix:

<头>

ID	A034	B034	C034	D034	A099	B099	A123	B123	...
1	不适用	1	不适用	不适用	不适用	3	1	不适用	...
2	2	不适用	不适用	不适用	2	不适用	不适用	2	...
3	不适用	不适用	2	不适用	不适用	2	1	不适用	...

每个块"中的列数也各不相同.有没有什么方法(除了手动，这是我一直在用 coalesce(!!! select(., contains("XXX"))) 煞费苦心地做的)来自动合并基于块的在共享后缀上?也就是说，结果应该类似于

The number of columns within each "chunk" also varies. Is there any way (other than manually, which is what I have been painstakingly doing with coalesce(!!! select(., contains("XXX")))) to automatically coalesce by chunk based on the shared suffix? That is, the result should resemble

<头>

ID	034	099	123	...
1	1	3	1	...
2	2	2	2	...
3	2	2	1	...

我不知道如何开始做这样的事情，所以任何建议都会非常有帮助.

I'm not sure how to begin doing something like this, so any suggestions would be very helpful.

推荐答案

我们使用 pivot_longer，然后我们按ID"分组并循环across其他列，应用na.omit 删除 NA 元素(我们假设每列每列只有一个非 NA 元素)

We reshape the data into 'long' format with pivot_longer, then we group by 'ID' and loop across the other columns, apply the na.omit to remove the NA elements (we assume that there is only one non-NA per each column by group)

library(dplyr)
library(tidyr)
df1 %>%
    pivot_longer(cols = -ID, names_to = ".value", 
           names_pattern = "[A-Z](\\d+)") %>% 
    group_by(ID) %>%
    summarise(across(everything(), na.omit), .groups = 'drop')

-输出

# A tibble: 3 x 4
     ID `034` `099` `123`
  <int> <int> <int> <int>
1     1     1     3     1
2     2     2     2     2
3     3     2     2     1

或者为了安全起见，使用 complete.cases 为非 NA 元素创建一个逻辑向量，并提取第一个元素(假设我们只需要一个非 NA - 如果非-NA长度不同，我们可能需要返回一个list)

Or to be safe, use complete.cases to create a logical vector for non-NA elements, and extract the first element (assuming we need only a single non-NA - if the non-NA lengths are different, we may need to return a list)

df1 %>%
     pivot_longer(cols = -ID, names_to = ".value",
          names_pattern = "[A-Z](\\d+)") %>%
     group_by(ID) %>% 
     summarise(across(everything(),  ~ .[complete.cases(.)][1]))

数据

df1 <- structure(list(ID = 1:3, A034 = c(NA, 2L, NA), B034 = c(1L, NA, 
NA), C034 = c(NA, NA, 2L), D034 = c(NA, NA, NA), A099 = c(NA, 
2L, NA), B099 = c(3L, NA, 2L), A123 = c(1L, NA, 1L), B123 = c(NA, 
2L, NA)), class = "data.frame", row.names = c(NA, -3L))

这篇关于合并名称中具有相同后缀的多个列块 (R)的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

合并名称中具有相同后缀的多个列块 (R) [英] Coalescing multiple chunks of columns with the same suffix in names (R)

问题描述

推荐答案

数据

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

合并名称中具有相同后缀的多个列块 (R) [英] Coalescing multiple chunks of columns with the same suffix in names (R)

问题描述

推荐答案

数据

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭