在R中重复列对 [英] Deduping Column pairs in R

查看:311
本文介绍了在R中重复列对的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含7列的数据框,并希望在前两列具有相同信息的记录,即使它们是相反的顺序。

I have a dataframe containing 7 columns and would like to records that have same info in the first two columns even they are in reverse order.

这是一个代码片段我的df

Here is a snippet of my df

 zip1  zip2       APP       PCR       SCR       APJ       PJR
1 01701 01701 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
2 01701 01702 0.9887567 0.9898379 0.9811615 0.9993856 0.9842659
3 01701 01703 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
4 01701 01704 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
5 01704 01701 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000

我知道如何使用独特的,但这里的扭曲是我想要处理zip1 = a和zip2 = b与zip1 = b和zip2 = a相同。所以我本来只想要这两个例子的一个记录。所以例如我只想要列4而不是列5
任何建议?

I know how to use unique, but the twist here is that I'd like to treat instances where zip1 = a and zip2 = b the same as zip1 = b and zip2 = a. So I'd essentially want only one records for those two instances. So for example I'd only want column 4 and not column 5 Any advice?

谢谢,

推荐答案

首先创建一个新的向量,用于识别具有特定zip对的行,但根据排序不区分:

First create a new vector which identifies rows with a particular zip pair but doesn't distinguish based upon the ordering:

zipUp<-paste(pmin(df$zip1,df$zip2),pmax(df$zip1,df$zip2))

现在在该向量中找到重复项,并将其从原始数据框中丢弃。

Now find duplicates in that vector, and discard them from the original data frame.

dups<-duplicated(zipUp)

newdf<-df[!dups,]






我假设前两列不包含NA。如果他们需要调整pmin,pmax调用以保持每对的任何非NA值


I am assuming that the first two columns will not contain NA. If they do you will need to adjust the pmin, pmax calls to keep any non NA value for each pair

这篇关于在R中重复列对的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆