如何将重复的行与缺少的字段R合并 [英] how to combine repeated rows with missing fields R

查看:44
本文介绍了如何将重复的行与缺少的字段R合并的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含多个重复条目的数据集,条目在很大程度上相同,但是缺少数据,丢失的数据可能在除ID之外的任何字段中.例如:

I have a a dataset with multiple repeated entries, the entries are largely the same, but with data missing, The missing data could be in any field except for the ID. For example:

 A tibble: 5 x 4
 ID    name    age   fsm
  <chr> <chr> <dbl> <dbl>
1 0001  Peter    13     NA
2 0001  NA       13     1
3 0002  Jane     13     1
4 0002  Jane     NA     1
5 0003  Billy    12     0

我需要合并行,即将NA与具有相同ID的其他行中的给定值合并

I need to combine the rows, i.e. merging the NAs with the given values from other rows that have the same ID

 ID    name    age   fsm
  <chr> <chr> <dbl> <dbl>
1 0001  Peter    13     1
2 0002  Jane     13     1
3 0003  Billy    12     0

上面的数据示例为dput:

Above data example as a dput:

structure(list(ID = c("0001", "0001", "0002", "0002", "0003"), 
name = c("Peter", NA, "Jane", "Jane", "Billy"), age = c(13, 
13, 13, NA, 12), fsm = c(NA, 1, 1, 1, 0)), class = c("spec_tbl_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -5L), spec = structure(list(
cols = list(ID = structure(list(), class = c("collector_character", 
"collector")), name = structure(list(), class = c("collector_character", 
"collector")), age = structure(list(), class = c("collector_double", 
"collector")), fsm = structure(list(), class = c("collector_double", 
"collector"))), default = structure(list(), class = c("collector_guess", 
"collector")), skip = 1L), class = "col_spec"))

推荐答案

已更新特别感谢亲爱的@akrun,他会毫不犹豫地提供我们解决方案的真知灼见,并与我们分享他的知识和经验.

Updated Special thanks to dear @akrun who does not hesitate to provide insight to our solutions and share his knowledge and experience with us.

我希望这是您要寻找的东西:

I hope this is what you are looking for:

library(dplyr)

df %>%
  group_by(ID) %>%
  summarise(across(everything(), ~ first(na.omit(.x))))

# A tibble: 3 x 4
  ID    name    age   fsm
  <chr> <chr> <dbl> <dbl>
1 0001  Peter    13     1
2 0002  Jane     13     1
3 0003  Billy    12     0

此解决方案也有效.在这些情况下,听起来可能有些冗长,但却非常有用和方便:

This solution also works. It may sound a bit verbose but quite useful and handy in these kinda situations:

library(dplyr)
library(tidyr)
library(purrr)

df %>%
  nest(data = -c(ID)) %>%
  mutate(data = map(data, ~ map_dfc(., na.omit))) %>%     # We use one map function inside the other since one will iterate elements of the nested list and the other iterate over the elements of the underlying tibbles
  unnest(cols = c(data)) %>%
  group_by(ID) %>%
  summarise(across(everything(), first))

# A tibble: 3 x 4
  ID    name    age   fsm
  <chr> <chr> <dbl> <dbl>
1 0001  Peter    13     1
2 0002  Jane     13     1
3 0003  Billy    12     0

这篇关于如何将重复的行与缺少的字段R合并的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆