使用ddply删除重复的行 [英] Removing duplicate rows with ddply

查看:77
本文介绍了使用ddply删除重复的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框df,其中包含两个因子变量(Var和Year)以及一个(实际上是几个)具有值的列.

I have a dataframe df containing two factor variables (Var and Year) as well as one (in reality several) column with values.

df <- structure(list(Var = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 
3L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor"), Year = structure(c(1L, 
2L, 3L, 1L, 2L, 3L, 3L, 1L, 2L, 3L), .Label = c("2000", "2001", 
"2002"), class = "factor"), Val = structure(c(1L, 2L, 2L, 4L, 
1L, 3L, 3L, 5L, 6L, 6L), .Label = c("2", "3", "4", "5", "8", 
"9"), class = "factor")), .Names = c("Var", "Year", "Val"), row.names = c(NA, 
-10L), class = "data.frame")

> df
   Var Year Val
1    A 2000   2
2    A 2001   3
3    A 2002   3
4    B 2000   5
5    B 2001   2
6    B 2002   4
7    B 2002   4
8    C 2000   8
9    C 2001   9
10   C 2002   9

现在,我想为每个VarYearVal查找具有相同值的行,并且仅保留其中之一.因此,在此示例中,我希望删除第7行.

Now I'd like to find rows with the same value for Val for each Var and Year and only keep one of those. So in this example I would like row 7 to be removed.

我试图用plyr找到类似的解决方案 df_new <- ddply(df, .(Var, Year), summarise, !duplicate(Val)) 但是显然,这不是ddply接受的功能.

I've tried to find a solution with plyr using something like df_new <- ddply(df, .(Var, Year), summarise, !duplicate(Val)) but obviously that is not a function accepted by ddply.

我发现了这个类似问题,但 Arun的解决方案仅给了我一个包含0行和0列的数据框,而且我对答案的理解不够充分,无法根据需要对其进行修改.

I found this similar question but the plyr solution by Arun only gives me a dataframe with 0 rows and 0 columns and I do not understand the answer well enough to modify it according to my needs.

关于如何实现此目标的任何提示?

Any hints on how to go about that?

推荐答案

您可以只使用unique()函数代替!duplicate(Val)

you can just used the unique() function instead of !duplicate(Val)

df_new <- ddply(df, .(Var, Year), summarise, Val=unique(Val))
# or
df_new <- ddply(df, .(Var, Year), function(x) x[!duplicated(x$Val),])
# or if you only have these 3 columns:
df_new <- ddply(df, .(Var, Year), unique)
# with dplyr
df%.%group_by(Var, Year)%.%filter(!duplicated(Val))

hth

这篇关于使用ddply删除重复的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆