R - 如何删除数据帧的两个准同一行? [英] R - How delete two quasi-identical rows of a data frame?
问题描述
-
或'
或 s
或:
或一行中的空格,但在另一行中没有空格。 我确实使用了
unique()
,但是这个函数只能使用相同的值。假设我们有这个 data.frame
Id< -c RoLu1976,Rolu1976,AlBl1989,Thaa1996)
艺术< -c(计量经济学评估:批判,计量经济学评估评论,自由裁量权和卢卡斯的非中性)
Id.1 <-c(FiKy1989,EdPr1986,BeBe1983,JoSt1989)
Art.1 <-c批评,卢卡斯批评注释,最优计划的不一致,最优计划的不一致)
N< -data.frame(Id,Art,Id.1,Art.1)
准相同的值位于变量 Art
在第一次观察中,它们不同于 s
和:
。如何过滤和删除这些值?
根据您的数据,我使用 agrep
来匹配类似的字符串:
yy = NULL
for(i in 1:length(N $ Art)){
temp = agrep(N [i,Art],N $ Art,value = T)
y = ifelse(any(N [i,Art] == temp),temp [1]我,艺术))
yy = c(yy,y)
}
然后用 yy
替换 N $ Art
,这将允许您使用重复/ unique
:
N $ Art = yy
N.2 = N [!重复(N $ Art),]
I have a data frame, and i need depurate it according with two variables but both variables are "quasi-identical" in the rows. It mean that they can have a -
or '
or s
or :
or a space in one row but in another row dont have it.
I did use unique()
but this function only works with identical values. Suppose that we have this data.frame
Id<-c("RoLu1976","Rolu1976","AlBl1989","ThSa1996")
Art<-c("Econometric Policy Evaluation: A Critique","Econometric Policy Evaluations A Critique", "Rules after discretion", "Expectations and the Nonneutrality of Lucas")
Id.1<-c("FiKy1989","EdPr1986","BeBe1983","JoSt1989")
Art.1<-c("Notes on the Lucas Critique","Notes on the Lucas Critique","The Inconsistency of Optimal Plans","The Inconsistency of Optimal Plans")
N<-data.frame(Id,Art,Id.1,Art.1)
The quasi identical values are in the variable Art
on the two first observation, which are different just for a s
and :
. How can I filter and delete these kind of values?
Based on your data, I used agrep
to match similar strings:
yy = NULL
for(i in 1:length(N$Art)){
temp = agrep(N[i,"Art"],N$Art,value=T)
y = ifelse(any(N[i,"Art"]==temp),temp[1],N[i,"Art"])
yy = c(yy,y)
}
Then replaced N$Art
with yy
, which will allow you to use duplicated/unique
:
N$Art = yy
N.2 = N[!duplicated(N$Art), ]
这篇关于R - 如何删除数据帧的两个准同一行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!