组合重复,不发布空白,dplyr :: distinct [英] combine duplicates, do not publish blanks, dplyr::distinct

查看:199
本文介绍了组合重复,不发布空白,dplyr :: distinct的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试使用dplyr distinct来组合行,删除重复项以及删除空格。这是我的数据框架:

  unique_id学校科目年级性别
1 great Math 88
1 great English 78
1伟大历史98男
2春季数学65
2春天英语72女
2春季历史84

当我运行(谢谢Akrun):

 (r2< ;  -  df%>%
group_by(unique_id)%>%
summarise_each(funs(toString(unique(。))))
pre>

我得到:

  unique_id学校年级性别
1好数学,英文,历史88,78,98,男
2春天英语,英语,历史65,72,84,女

我不希望在最后一个变量,性别中包含空格或逗号。相反,我希望它如下所示:

  unique_id学校科目年级性别
1伟大的数学,英语,历史88,78,98男性
2春季英语,英语,历史65,72,84女性

任何尝试在导入时添加NA,然后在冷凝后将其删除,并且不起作用。任何想法如何凝结行,但只保留行中的值并忽略空白?谢谢。

解决方案

也许您遇到问题的原因是您应该使用NAs时使用空字符串。这就是我假定的惯用的代码。

  df<  -  data.frame(unique_id = c(rep(1,3),rep(2,3)) ,
school = c(rep('great',3),rep('spring',3)),
subject = rep(c(Math,English,History) ,2),
grade = c(88,78,98,65,72,84),
sex = c(NA,NA,male,NA,female,NA))

r2< - df%>%
group_by(unique_id)%>%
summarise_each(funs(toString(unique(。))))

其中返回

 #a tibble:2 x 5 
unique_id school subject grade sex
< dbl> < CHR> < CHR> < CHR> < CHR>
1 1数学,英语,历史88,78,98 NA,男
2 2春季数学,英文,历史65,72,84 NA,女

您可以随时

  r2 $ sex< ;  -  sapply(stringr :: str_split(r2 $ sex,,),[,2)

之后,如果您真的想要删除这些NAs,但我认为它们是翔实的。



您可以编写自己的函数来提供 summarize_each ,这将允许您在任何柱。请注意,您只需要这样做,因为唯一,所以没有一个 na.rm 参数。

  rm_na_unique<  -  function(vec){
unique(vec [!is.na(vec)])
}

r2< - df%>%
group_by(unique_id)%>%
summarise_each(funs(toString(rm_na_unique(。))) )

给你一样的结果

 #a tibble:2 x 5 
unique_id school subject grade sex
< dbl> < CHR> < CHR> < CHR> < CHR>
1 1数学,英语,历史88,78,98男
2 2春季数学,英文,历史65,72,84女


I trying to using dplyr distinct to combine rows, delete duplicates, and delete blanks as well. Here is my data frame:

unique_id   school  subject  grade  sex
    1       great   Math      88    
    1       great   English   78    
    1       great   History   98    male
    2       spring  Math      65    
    2       spring  English   72    female
    2       spring  History   84    

When I run (thank you Akrun):

(r2 <- df %>%
  group_by(unique_id) %>% 
  summarise_each(funs(toString(unique(.)))))

I get:

unique_id   school  subject                     grade       sex
    1       great   Math, English, History      88,78,98     , male 
    2       spring  English, English, History   65,72,84     , female

I don't want blanks to be included or commas in the last variable, sex. Instead, I'd like it to look as follows:

unique_id   school  subject                     grade       sex
    1       great   Math, English, History      88,78,98     male   
    2       spring  English, English, History   65,72,84     female

Any tried adding NA on the import, then removing it after condensing and that didn't work. Any ideas how to condense rows, but only keep the value in the row and ignore blanks? Thank you.

解决方案

Perhaps the reason that you are having problems is that you are using empty strings when you should be using NAs. This is what I would assume is the idiomatic code.

df <- data.frame(unique_id = c(rep(1,3),rep(2,3)),
                school = c(rep('great',3),rep('spring',3)),
                           subject = rep(c("Math", "English", "History"),2),
                           grade = c(88,78,98,65,72,84),
                           sex = c(NA,NA, "male", NA, "female", NA))

r2 <- df %>%
  group_by(unique_id) %>% 
  summarise_each(funs(toString(unique(.))))

which returns

# A tibble: 2 x 5
  unique_id school                subject      grade        sex
      <dbl>  <chr>                  <chr>      <chr>      <chr>
1         1  great Math, English, History 88, 78, 98   NA, male
2         2 spring Math, English, History 65, 72, 84 NA, female

You can always

 r2$sex <- sapply(stringr::str_split(r2$sex, ", "),"[",2)

afterwards if you really want to remove those NAs, but I see them as informative.

You can write your own function to supply to summarize_each, which will allow you to take care of NAs in any column. Note, that you only need to do this because unique, rightfully so, does not have an na.rm argument.

rm_na_unique <- function(vec){
  unique(vec[!is.na(vec)])
}

r2 <- df %>%
       group_by(unique_id) %>% 
       summarise_each(funs(toString(rm_na_unique(.))))

Gives you the same result

# A tibble: 2 x 5
  unique_id school                subject      grade    sex
      <dbl>  <chr>                  <chr>      <chr>  <chr>
1         1  great Math, English, History 88, 78, 98   male
2         2 spring Math, English, History 65, 72, 84 female

这篇关于组合重复,不发布空白,dplyr :: distinct的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆