根据特定条件删除重复项 [英] Remove duplicates based on specific criteria

查看:35
本文介绍了根据特定条件删除重复项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个如下所示的数据集:

I have a dataset that looks something like this:

df <- structure(list(Claim.Num = c(500L, 500L, 600L, 600L, 700L, 700L, 
100L, 200L, 300L), Amount = c(NA, 1000L, NA, 564L, 0L, 200L, 
NA, 0L, NA), Company = structure(c(NA, 1L, NA, 4L, 2L, 3L, NA, 
3L, NA), .Label = c("ATT", "Boeing", "Petco", "T Mobile"), class = "factor")), .Names =     
c("Claim.Num", "Amount", "Company"), class = "data.frame", row.names = c(NA, 
-9L))

我想根据 Claim Num 值删除重复行,但要根据以下条件删除重复行:df$Company == 'NA' |df$Amount == 0

I want to remove duplicate rows based on Claim Num values, but to remove duplicates based on the following criteria: df$Company == 'NA' | df$Amount == 0

换句话说,删除记录 1、3 和 5.

In other words, remove records 1, 3, and 5.

我已经做到了这一点:df <- df[!duplicated(df$Claim.Num[which(df$Amount = 0 | df$Company == 'NA')]),]

代码运行没有错误,但实际上并未根据所需条件删除重复行.我认为这是因为我告诉它删除与这些标准匹配的任何重复的 Claim Nums,而不是删除任何重复的 Claim.Num 而是处理某些 Amounts &公司优先搬迁.请注意,我不能简单地根据指定值过滤掉数据集,因为还有其他记录可能具有 0 或 NA 值,需要包含(例如,不应该排除记录 8 和 9,因为它们的声明.数字不重复).

The code runs without errors, but doesn't actually remove duplicate rows based on the required criteria. I think that's because I'm telling it to remove any duplicate Claim Nums which match to those criteria, but not to remove any duplicate Claim.Num but treat certain Amounts & Companies preferentially for removal. Please note that, I can't simple filter out the dataset based on specified values, as there are other records that may have 0 or NA values, that require inclusion (e.g. records 8 & 9 shouldn't be excluded because their Claim.Nums are not duplicated).

推荐答案

如果您先订购数据框,那么您可以确保 duplicated 保留您想要的:

If you order your data frame first, then you can make sure duplicated keeps the ones you want:

df.tmp <- with(df, df[order(ifelse(is.na(Company) | Amount == 0, 1, 0)), ])
df.tmp[!duplicated(df.tmp$Claim.Num), ]
#   Claim.Num Amount  Company
# 2       500   1000      ATT
# 4       600    564 T Mobile
# 6       700    200    Petco
# 7       100     NA     <NA>
# 8       200      0    Petco
# 9       300     NA     <NA>

这篇关于根据特定条件删除重复项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆