dplyr：独特与独特之间的区别 [英] dplyr: Difference between unique and distinct

查看：66 发布时间：2020/10/15 19:45:05 r data.table dplyr

本文介绍了dplyr：独特与独特之间的区别的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

当使用不重复和唯一时，结果行的数量似乎不同。我正在使用的数据集非常庞大。希望代码可以理解。

Seems the number of resulting rows is different when using distinct vs unique. The data set I am working with is huge. Hope the code is OK to understand.

dt2a <- select(dt, mutation.genome.position, 
  mutation.cds, primary.site, sample.name, mutation.id) %>%
  group_by(mutation.genome.position, mutation.cds, primary.site) %>% 
  mutate(occ = nrow(.)) %>%
  select(-sample.name) %>% distinct()
dim(dt2a)
[1] 2316382       5

## Using unique instead
dt2b <- select(dt, mutation.genome.position, mutation.cds, 
   primary.site, sample.name, mutation.id) %>%
  group_by(mutation.genome.position, mutation.cds, primary.site) %>%
  mutate(occ = nrow(.)) %>%
  select(-sample.name) %>% unique()
dim(dt2b)
[1] 2837982       5

这是我正在使用的文件：

This is the file I am working with:

sftp：//sftp-cancer.sanger.ac.uk/files/grch38/cosmic/ v72 / CosmicMutantExport.tsv.gz

sftp://sftp-cancer.sanger.ac.uk/files/grch38/cosmic/v72/CosmicMutantExport.tsv.gz

     dt = fread(fl)

推荐答案

这似乎是 group_by 考虑这种情况的结果

This appears to be a result of the group_by Consider this case

dt<-data.frame(g=rep(c("a","b"), each=3),
    v=c(2,2,5,2,7,7))

dt %>% group_by(g) %>% unique()
# Source: local data frame [4 x 2]
# Groups: g
# 
#   g v
# 1 a 2
# 2 a 5
# 3 b 2
# 4 b 7

dt %>% group_by(g) %>% distinct()
# Source: local data frame [2 x 2]
# Groups: g
# 
#   g v
# 1 a 2
# 2 b 2

dt %>% group_by(g) %>% distinct(v)
# Source: local data frame [4 x 2]
# Groups: g
# 
#   g v
# 1 a 2
# 2 a 5
# 3 b 2
# 4 b 7

使用 distinct（） 没有指出要区分哪些变量，而是使用了分组变量。

When you use distinct() without indicating which variables to make distinct, it appears to use the grouping variable.

这篇关于dplyr：独特与独特之间的区别的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

dplyr：独特与独特之间的区别 [英] dplyr: Difference between unique and distinct

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

dplyr：独特与独特之间的区别 [英] dplyr: Difference between unique and distinct

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭