R - 基于两列识别和删除重复的行 [英] R - Identify and remove duplicate rows based on two columns

查看:51
本文介绍了R - 基于两列识别和删除重复的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些看起来像这样的数据:

I have some data that looks like this:

Course_ID   Text_ID
33          17
33          17
58          17
5           22
8           22
42          25
42          25
17          26
17          26
35          39
51          39

没有编程背景,我发现表达我的问题很棘手,但这里是:我只想保留 Course_ID 变化但 Text_ID 变化的行> 是一样的.因此,例如,最终数据将如下所示:

Not having a background in programming, I'm finding it tricky to articulate my question, but here goes: I only want to keep rows where Course_ID varies but where Text_ID is the same. So for example, the final data would look something like this:

Course_ID   Text_ID
5           22
8           22
35          39
51          39

如您所见,只有 Text_ID 22 和 39 具有不同的 Course_ID 值.我怀疑对数据进行子集化是可行的方法,但正如我所说,我在这方面是个新手,非常感谢有关如何处理此问题的任何建议.

As you can see, Text_ID 22 and 39 are the only ones that have different Course_ID values. I suspect subsetting the data would be the way to go, but as I said, I'm quite a novice at this kind of thing and would really appreciate any advice on how to approach this.

推荐答案

选择那些没有重复Course_ID的组.

Select those groups where there is no repeats of Course_ID.

dplyr 中你可以把它写成 -

In dplyr you can write this as -

library(dplyr)
df %>% group_by(Text_ID) %>% filter(n_distinct(Course_ID) == n()) %>% ungroup

#  Course_ID Text_ID
#      <int>   <int>
#1         5      22
#2         8      22
#3        35      39
#4        51      39

data.table -

library(data.table)
setDT(df)[, .SD[uniqueN(Course_ID) == .N], Text_ID]

这篇关于R - 基于两列识别和删除重复的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆