R - 基于两列识别和删除重复的行 [英] R - Identify and remove duplicate rows based on two columns

查看：51 发布时间：2021/9/1 18:36:51 r duplicates subset unique

本文介绍了R - 基于两列识别和删除重复的行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一些看起来像这样的数据:

I have some data that looks like this:

Course_ID   Text_ID
33          17
33          17
58          17
5           22
8           22
42          25
42          25
17          26
17          26
35          39
51          39

没有编程背景，我发现表达我的问题很棘手，但这里是:我只想保留 Course_ID 变化但 Text_ID 变化的行> 是一样的.因此，例如，最终数据将如下所示:

Not having a background in programming, I'm finding it tricky to articulate my question, but here goes: I only want to keep rows where Course_ID varies but where Text_ID is the same. So for example, the final data would look something like this:

Course_ID   Text_ID
5           22
8           22
35          39
51          39

如您所见，只有 Text_ID 22 和 39 具有不同的 Course_ID 值.我怀疑对数据进行子集化是可行的方法，但正如我所说，我在这方面是个新手，非常感谢有关如何处理此问题的任何建议.

As you can see, Text_ID 22 and 39 are the only ones that have different Course_ID values. I suspect subsetting the data would be the way to go, but as I said, I'm quite a novice at this kind of thing and would really appreciate any advice on how to approach this.

推荐答案

选择那些没有重复Course_ID的组.

Select those groups where there is no repeats of Course_ID.

在 dplyr 中你可以把它写成 -

In dplyr you can write this as -

library(dplyr)
df %>% group_by(Text_ID) %>% filter(n_distinct(Course_ID) == n()) %>% ungroup

#  Course_ID Text_ID
#      <int>   <int>
#1         5      22
#2         8      22
#3        35      39
#4        51      39

和data.table -

library(data.table)
setDT(df)[, .SD[uniqueN(Course_ID) == .N], Text_ID]

这篇关于R - 基于两列识别和删除重复的行的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

R - 基于两列识别和删除重复的行 [英] R - Identify and remove duplicate rows based on two columns

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

R - 基于两列识别和删除重复的行 [英] R - Identify and remove duplicate rows based on two columns

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭