使用dplyr（）随机删除重复的行 [英] Randomly remove duplicated rows using dplyr()

查看：64 发布时间：2020/10/26 4:33:31 r dplyr

本文介绍了使用dplyr（）随机删除重复的行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

作为对此问题的后续问题：使用dplyr删除重复的行，我有以下内容：

As a follow-up question to this one: Remove duplicated rows using dplyr, I have the following:

如何使用dplyr（）（以及其他方法）随机删除重复的行？

我现在的命令是：

data.uniques <- distinct(data, KEYVARIABLE, .keep_all = TRUE)

但是它返回KEYVARIABLE的第一个匹配项。我希望这种行为是随机的：因此在 1 到 n 个可变量之间出现的任何地方。

But it returns the first occurrence of the KEYVARIABLE. I want that behaviour to be random: so anywhere between 1 and n occurrences of that KEYVARIABLE.

例如：

KEYVARIABLE BMI
1 24.2
2 25.3
2 23.2
3 18.9
4 19
4 20.1
5 23.0

当前我的命令返回：

KEYVARIABLE BMI
1 24.2
2 25.3
3 18.9
4 19
5 23.0

我希望它随机返回 n 个重复行之一，例如：

I want it to randomly return one of the n duplicated rows, for instance:

KEYVARIABLE BMI
1 24.2
2 23.2
3 18.9
4 19
5 23.0

推荐答案

一种选择是按 KEYVARIABLE分组，然后按 sample 选择行并为数据集子集的行顺序

One option would be to group by 'KEYVARIABLE' and then sample the sequence of rows to select the row and Subset the dataset

library(data.table)
setDT(df1)[, .SD[sample(.N)[1]], KEYVARIABLE]

或使用 dplyr

library(dplyr)
df1 %>% 
   group_by(KEYVARIABLE) %>%
   sample_n(1)

这篇关于使用dplyr（）随机删除重复的行的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用dplyr（）随机删除重复的行 [英] Randomly remove duplicated rows using dplyr()

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用dplyr（）随机删除重复的行 [英] Randomly remove duplicated rows using dplyr()

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭