使用dplyr、GROUP_BY WITH MUBLATE()或SUMMARM()&str_c()或Paste()&折叠连接字符串/行,但保留所有字符串(&A) [英] Concatenating strings / rows using dplyr, group_by with mutate() or summarize() & str_c() or paste() & collapse, but maintain NA & all strings

查看:5
本文介绍了使用dplyr、GROUP_BY WITH MUBLATE()或SUMMARM()&str_c()或Paste()&折叠连接字符串/行,但保留所有字符串(&A)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用dplyrgroup_by()mutate()summarize ()paste()collapse连接字符串时,NA值被强制为字符串"NA"

使用str_c()而不是paste()时,与NA连接的字符串将被删除(?str_c每当缺少的值与另一个字符串组合时,结果将始终缺少)。当具有NA和非NA值的这种组合时,如何删除串联中的NA而不是非NA

参见下面的示例:

library(dplyr)
library(stringr)
ID <- c(1,1,2,2,3,4)
string <- c(' asfdas ', 'sdf', NA,'sadf', 'NA', NA)
df <- data.frame(ID, string)
#   ID   string
# 1  1  asfdas 
# 2  1      sdf
# 3  2     <NA> # ID 2 has both NA and non-NA values
# 4  2     sadf #
# 5  3       NA
# 6  4     <NA>

两者,

df%>%
 group_by(ID)%>%
 summarize(string = paste(string, collapse = "; "))%>%
 distinct_all()

df_conca <-df%>%
 group_by(ID)%>%
 dplyr::mutate(string = paste(string, collapse = "; "))%>%
 distinct_all()

结果

     ID string               
1     1 " asfdas ; sdf"
2     2 "NA; sadf"           
3     3 "NA"
4     4 "NA" # NA coerced to "NA"

NA变为:

同时

df %>%
  group_by(ID)%>%
  summarize(string = str_c(string, collapse = "; "))

结果:

     ID string               
1     1 " asfdas ; sdf"
2     2 NA     
3     3 "NA" 
4     4 NA 

即根据str_c规则:NA与字符串组合,得到NA

但是,我希望保留真实的NA值(例如‘id’4)和仅字符串(例如‘id’2),因此:

     ID string             
1     1 " asfdas ; sdf"
2     2 "sadf"           
3     3 "NA"
4     4 NA 

理想情况下,我希望留在dplyr工作流中。


此问题是Concatenating strings / rows using dplyr, group_by & collapse or summarize, but maintain NA values

的扩展

推荐答案

使用pivot_widerunite

library(dplyr)
library(tidyr)
library(data.table)
df %>% 
   mutate(rn = rowid(ID)) %>%
   pivot_wider(names_from = rn, values_from = string) %>% 
   unite(string, `1`, `2`, na.rm = TRUE, sep = " ; ")%>%
   mutate(string = na_if(string, ""))

-输出

# A tibble: 4 x 2
     ID string          
  <dbl> <chr>           
1     1 " asfdas  ; sdf"
2     2 "sadf"          
3     3 "NA"            
4     4  <NA>         

或也可以使用coalesce

df %>%
    group_by(ID) %>%
    summarise(string = na_if(coalesce(str_c(string, collapse = " ; "),
     str_c(string[complete.cases(string)], collapse = " ; ")), ""))

-输出

# A tibble: 4 x 2
     ID string          
  <dbl> <chr>           
1     1 " asfdas  ; sdf"
2     2 "sadf"          
3     3 "NA"            
4     4  <NA>          

这篇关于使用dplyr、GROUP_BY WITH MUBLATE()或SUMMARM()&amp;str_c()或Paste()&amp;折叠连接字符串/行,但保留所有字符串(&A)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆