使用ddply删除重复的行 [英] Removing duplicate rows with ddply
问题描述
我有一个数据框df
,其中包含两个因子变量(Var和Year)以及一个(实际上是几个)具有值的列.
I have a dataframe df
containing two factor variables (Var and Year) as well as one (in reality several) column with values.
df <- structure(list(Var = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 2L,
3L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor"), Year = structure(c(1L,
2L, 3L, 1L, 2L, 3L, 3L, 1L, 2L, 3L), .Label = c("2000", "2001",
"2002"), class = "factor"), Val = structure(c(1L, 2L, 2L, 4L,
1L, 3L, 3L, 5L, 6L, 6L), .Label = c("2", "3", "4", "5", "8",
"9"), class = "factor")), .Names = c("Var", "Year", "Val"), row.names = c(NA,
-10L), class = "data.frame")
> df
Var Year Val
1 A 2000 2
2 A 2001 3
3 A 2002 3
4 B 2000 5
5 B 2001 2
6 B 2002 4
7 B 2002 4
8 C 2000 8
9 C 2001 9
10 C 2002 9
现在,我想为每个Var
和Year
的Val
查找具有相同值的行,并且仅保留其中之一.因此,在此示例中,我希望删除第7行.
Now I'd like to find rows with the same value for Val
for each Var
and Year
and only keep one of those. So in this example I would like row 7 to be removed.
我试图用plyr
找到类似的解决方案
df_new <- ddply(df, .(Var, Year), summarise, !duplicate(Val))
但是显然,这不是ddply
接受的功能.
I've tried to find a solution with plyr
using something like
df_new <- ddply(df, .(Var, Year), summarise, !duplicate(Val))
but obviously that is not a function accepted by ddply
.
我发现了这个类似问题,但
I found this similar question but the plyr
solution by Arun only gives me a dataframe with 0 rows and 0 columns and I do not understand the answer well enough to modify it according to my needs.
关于如何实现此目标的任何提示?
Any hints on how to go about that?
推荐答案
您可以只使用unique()函数代替!duplicate(Val)
you can just used the unique() function instead of !duplicate(Val)
df_new <- ddply(df, .(Var, Year), summarise, Val=unique(Val))
# or
df_new <- ddply(df, .(Var, Year), function(x) x[!duplicated(x$Val),])
# or if you only have these 3 columns:
df_new <- ddply(df, .(Var, Year), unique)
# with dplyr
df%.%group_by(Var, Year)%.%filter(!duplicated(Val))
hth
这篇关于使用ddply删除重复的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!