如何将重复的行与缺少的字段R合并 [英] how to combine repeated rows with missing fields R
问题描述
我有一个包含多个重复条目的数据集,条目在很大程度上相同,但是缺少数据,丢失的数据可能在除ID之外的任何字段中.例如:
I have a a dataset with multiple repeated entries, the entries are largely the same, but with data missing, The missing data could be in any field except for the ID. For example:
A tibble: 5 x 4
ID name age fsm
<chr> <chr> <dbl> <dbl>
1 0001 Peter 13 NA
2 0001 NA 13 1
3 0002 Jane 13 1
4 0002 Jane NA 1
5 0003 Billy 12 0
我需要合并行,即将NA与具有相同ID的其他行中的给定值合并
I need to combine the rows, i.e. merging the NAs with the given values from other rows that have the same ID
ID name age fsm
<chr> <chr> <dbl> <dbl>
1 0001 Peter 13 1
2 0002 Jane 13 1
3 0003 Billy 12 0
上面的数据示例为dput:
Above data example as a dput:
structure(list(ID = c("0001", "0001", "0002", "0002", "0003"),
name = c("Peter", NA, "Jane", "Jane", "Billy"), age = c(13,
13, 13, NA, 12), fsm = c(NA, 1, 1, 1, 0)), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -5L), spec = structure(list(
cols = list(ID = structure(list(), class = c("collector_character",
"collector")), name = structure(list(), class = c("collector_character",
"collector")), age = structure(list(), class = c("collector_double",
"collector")), fsm = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1L), class = "col_spec"))
推荐答案
已更新特别感谢亲爱的@akrun,他会毫不犹豫地提供我们解决方案的真知灼见,并与我们分享他的知识和经验.
Updated Special thanks to dear @akrun who does not hesitate to provide insight to our solutions and share his knowledge and experience with us.
我希望这是您要寻找的东西:
I hope this is what you are looking for:
library(dplyr)
df %>%
group_by(ID) %>%
summarise(across(everything(), ~ first(na.omit(.x))))
# A tibble: 3 x 4
ID name age fsm
<chr> <chr> <dbl> <dbl>
1 0001 Peter 13 1
2 0002 Jane 13 1
3 0003 Billy 12 0
此解决方案也有效.在这些情况下,听起来可能有些冗长,但却非常有用和方便:
This solution also works. It may sound a bit verbose but quite useful and handy in these kinda situations:
library(dplyr)
library(tidyr)
library(purrr)
df %>%
nest(data = -c(ID)) %>%
mutate(data = map(data, ~ map_dfc(., na.omit))) %>% # We use one map function inside the other since one will iterate elements of the nested list and the other iterate over the elements of the underlying tibbles
unnest(cols = c(data)) %>%
group_by(ID) %>%
summarise(across(everything(), first))
# A tibble: 3 x 4
ID name age fsm
<chr> <chr> <dbl> <dbl>
1 0001 Peter 13 1
2 0002 Jane 13 1
3 0003 Billy 12 0
这篇关于如何将重复的行与缺少的字段R合并的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!