合并名称中具有相同后缀的多个列块 (R) [英] Coalescing multiple chunks of columns with the same suffix in names (R)
问题描述
我有一个包含各种块"的数据集;具有不同前缀但后缀相同的列:
I have a dataset with various "chunks" of columns with different prefixes, but the same suffix:
ID | A034 | B034 | C034 | D034 | A099 | B099 | A123 | B123 | ... |
---|---|---|---|---|---|---|---|---|---|
1 | 不适用 | 1 | 不适用 | 不适用 | 不适用 | 3 | 1 | 不适用 | ... |
2 | 2 | 不适用 | 不适用 | 不适用 | 2 | 不适用 | 不适用 | 2 | ... |
3 | 不适用 | 不适用 | 2 | 不适用 | 不适用 | 2 | 1 | 不适用 | ... |
每个块"中的列数也各不相同.有没有什么方法(除了手动,这是我一直在用 coalesce(!!! select(., contains("XXX")))
煞费苦心地做的)来自动合并基于块的在共享后缀上?也就是说,结果应该类似于
The number of columns within each "chunk" also varies. Is there any way (other than manually, which is what I have been painstakingly doing with coalesce(!!! select(., contains("XXX")))
) to automatically coalesce by chunk based on the shared suffix? That is, the result should resemble
ID | 034 | 099 | 123 | ... |
---|---|---|---|---|
1 | 1 | 3 | 1 | ... |
2 | 2 | 2 | 2 | ... |
3 | 2 | 2 | 1 | ... |
我不知道如何开始做这样的事情,所以任何建议都会非常有帮助.
I'm not sure how to begin doing something like this, so any suggestions would be very helpful.
推荐答案
我们使用 pivot_longer
,然后我们按ID"分组并循环across
其他列,应用na.omit
删除 NA 元素(我们假设每列每列只有一个非 NA 元素)
We reshape the data into 'long' format with pivot_longer
, then we group by 'ID' and loop across
the other columns, apply the na.omit
to remove the NA elements (we assume that there is only one non-NA per each column by group)
library(dplyr)
library(tidyr)
df1 %>%
pivot_longer(cols = -ID, names_to = ".value",
names_pattern = "[A-Z](\\d+)") %>%
group_by(ID) %>%
summarise(across(everything(), na.omit), .groups = 'drop')
-输出
# A tibble: 3 x 4
ID `034` `099` `123`
<int> <int> <int> <int>
1 1 1 3 1
2 2 2 2 2
3 3 2 2 1
或者为了安全起见,使用 complete.cases
为非 NA 元素创建一个逻辑向量,并提取第一个元素(假设我们只需要一个非 NA - 如果非-NA长度不同,我们可能需要返回一个list
)
Or to be safe, use complete.cases
to create a logical vector for non-NA elements, and extract the first element (assuming we need only a single non-NA - if the non-NA lengths are different, we may need to return a list
)
df1 %>%
pivot_longer(cols = -ID, names_to = ".value",
names_pattern = "[A-Z](\\d+)") %>%
group_by(ID) %>%
summarise(across(everything(), ~ .[complete.cases(.)][1]))
数据
df1 <- structure(list(ID = 1:3, A034 = c(NA, 2L, NA), B034 = c(1L, NA,
NA), C034 = c(NA, NA, 2L), D034 = c(NA, NA, NA), A099 = c(NA,
2L, NA), B099 = c(3L, NA, 2L), A123 = c(1L, NA, 1L), B123 = c(NA,
2L, NA)), class = "data.frame", row.names = c(NA, -3L))
这篇关于合并名称中具有相同后缀的多个列块 (R)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!