组合重复,不发布空白,dplyr :: distinct [英] combine duplicates, do not publish blanks, dplyr::distinct
问题描述
我尝试使用dplyr distinct来组合行,删除重复项以及删除空格。这是我的数据框架:
unique_id学校科目年级性别
1 great Math 88
1 great English 78
1伟大历史98男
2春季数学65
2春天英语72女
2春季历史84
当我运行(谢谢Akrun):
(r2< ; - df%>%
pre>
group_by(unique_id)%>%
summarise_each(funs(toString(unique(。))))
我得到:
unique_id学校年级性别
1好数学,英文,历史88,78,98,男
2春天英语,英语,历史65,72,84,女
我不希望在最后一个变量,性别中包含空格或逗号。相反,我希望它如下所示:
unique_id学校科目年级性别
1伟大的数学,英语,历史88,78,98男性
2春季英语,英语,历史65,72,84女性
任何尝试在导入时添加NA,然后在冷凝后将其删除,并且不起作用。任何想法如何凝结行,但只保留行中的值并忽略空白?谢谢。
解决方案也许您遇到问题的原因是您应该使用NAs时使用空字符串。这就是我假定的惯用的代码。
df< - data.frame(unique_id = c(rep(1,3),rep(2,3)) ,
school = c(rep('great',3),rep('spring',3)),
subject = rep(c(Math,English,History) ,2),
grade = c(88,78,98,65,72,84),
sex = c(NA,NA,male,NA,female,NA))
r2< - df%>%
group_by(unique_id)%>%
summarise_each(funs(toString(unique(。))))
其中返回
#a tibble:2 x 5
unique_id school subject grade sex
< dbl> < CHR> < CHR> < CHR> < CHR>
1 1数学,英语,历史88,78,98 NA,男
2 2春季数学,英文,历史65,72,84 NA,女
您可以随时
r2 $ sex< ; - sapply(stringr :: str_split(r2 $ sex,,),[,2)
之后,如果您真的想要删除这些NAs,但我认为它们是翔实的。
您可以编写自己的函数来提供
summarize_each
,这将允许您在任何柱。请注意,您只需要这样做,因为唯一
,所以没有一个na.rm
参数。rm_na_unique< - function(vec){
unique(vec [!is.na(vec)])
}
r2< - df%>%
group_by(unique_id)%>%
summarise_each(funs(toString(rm_na_unique(。))) )
给你一样的结果
#a tibble:2 x 5
unique_id school subject grade sex
< dbl> < CHR> < CHR> < CHR> < CHR>
1 1数学,英语,历史88,78,98男
2 2春季数学,英文,历史65,72,84女
I trying to using dplyr distinct to combine rows, delete duplicates, and delete blanks as well. Here is my data frame:
unique_id school subject grade sex 1 great Math 88 1 great English 78 1 great History 98 male 2 spring Math 65 2 spring English 72 female 2 spring History 84
When I run (thank you Akrun):
(r2 <- df %>% group_by(unique_id) %>% summarise_each(funs(toString(unique(.)))))
I get:
unique_id school subject grade sex 1 great Math, English, History 88,78,98 , male 2 spring English, English, History 65,72,84 , female
I don't want blanks to be included or commas in the last variable, sex. Instead, I'd like it to look as follows:
unique_id school subject grade sex 1 great Math, English, History 88,78,98 male 2 spring English, English, History 65,72,84 female
Any tried adding NA on the import, then removing it after condensing and that didn't work. Any ideas how to condense rows, but only keep the value in the row and ignore blanks? Thank you.
解决方案Perhaps the reason that you are having problems is that you are using empty strings when you should be using NAs. This is what I would assume is the idiomatic code.
df <- data.frame(unique_id = c(rep(1,3),rep(2,3)), school = c(rep('great',3),rep('spring',3)), subject = rep(c("Math", "English", "History"),2), grade = c(88,78,98,65,72,84), sex = c(NA,NA, "male", NA, "female", NA)) r2 <- df %>% group_by(unique_id) %>% summarise_each(funs(toString(unique(.))))
which returns
# A tibble: 2 x 5 unique_id school subject grade sex <dbl> <chr> <chr> <chr> <chr> 1 1 great Math, English, History 88, 78, 98 NA, male 2 2 spring Math, English, History 65, 72, 84 NA, female
You can always
r2$sex <- sapply(stringr::str_split(r2$sex, ", "),"[",2)
afterwards if you really want to remove those NAs, but I see them as informative.
You can write your own function to supply to
summarize_each
, which will allow you to take care of NAs in any column. Note, that you only need to do this becauseunique
, rightfully so, does not have anna.rm
argument.rm_na_unique <- function(vec){ unique(vec[!is.na(vec)]) } r2 <- df %>% group_by(unique_id) %>% summarise_each(funs(toString(rm_na_unique(.))))
Gives you the same result
# A tibble: 2 x 5 unique_id school subject grade sex <dbl> <chr> <chr> <chr> <chr> 1 1 great Math, English, History 88, 78, 98 male 2 2 spring Math, English, History 65, 72, 84 female
这篇关于组合重复,不发布空白,dplyr :: distinct的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!