R - 如何删除数据帧的两个准同一行? [英] R - How delete two quasi-identical rows of a data frame?

查看:107
本文介绍了R - 如何删除数据帧的两个准同一行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,我需要根据两个变量进行去除,但是这两个变量在行中都是准相同的。这意味着他们可以有一个 - ' s 或一行中的空格,但在另一行中没有空格。
我确实使用了 unique(),但是这个函数只能使用相同的值。假设我们有这个 data.frame

  Id< -c RoLu1976,Rolu1976,AlBl1989,Thaa1996)
艺术< -c(计量经济学评估:批判,计量经济学评估评论,自由裁量权和卢卡斯的非中性)
Id.1 <-c(FiKy1989,EdPr1986,BeBe1983,JoSt1989)
Art.1 <-c批评,卢卡斯批评注释,最优计划的不一致,最优计划的不一致)
N< -data.frame(Id,Art,Id.1,Art.1)

准相同的值位于变量 Art 在第一次观察中,它们不同于 s 。如何过滤和删除这些值?

解决方案

根据您的数据,我使用 agrep 来匹配类似的字符串:

  yy = NULL 
for(i in 1:length(N $ Art)){
temp = agrep(N [i,Art],N $ Art,value = T)
y = ifelse(any(N [i,Art] == temp),temp [1]我,艺术))
yy = c(yy,y)
}

然后用 yy 替换 N $ Art ,这将允许您使用重复/ unique

  N $ Art = yy 
N.2 = N [!重复(N $ Art),]


I have a data frame, and i need depurate it according with two variables but both variables are "quasi-identical" in the rows. It mean that they can have a - or ' or s or :or a space in one row but in another row dont have it. I did use unique()but this function only works with identical values. Suppose that we have this data.frame

Id<-c("RoLu1976","Rolu1976","AlBl1989","ThSa1996")
Art<-c("Econometric Policy Evaluation: A Critique","Econometric Policy Evaluations A Critique", "Rules after discretion", "Expectations and the Nonneutrality of Lucas")
Id.1<-c("FiKy1989","EdPr1986","BeBe1983","JoSt1989")
Art.1<-c("Notes on the Lucas Critique","Notes on the Lucas Critique","The Inconsistency of Optimal Plans","The Inconsistency of Optimal Plans")
N<-data.frame(Id,Art,Id.1,Art.1)

The quasi identical values are in the variable Art on the two first observation, which are different just for a sand :. How can I filter and delete these kind of values?

解决方案

Based on your data, I used agrep to match similar strings:

yy = NULL
for(i in 1:length(N$Art)){
    temp = agrep(N[i,"Art"],N$Art,value=T)
    y = ifelse(any(N[i,"Art"]==temp),temp[1],N[i,"Art"])
    yy = c(yy,y)
}

Then replaced N$Art with yy, which will allow you to use duplicated/unique:

N$Art = yy
N.2 = N[!duplicated(N$Art), ]

这篇关于R - 如何删除数据帧的两个准同一行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆