确定R中两个数据集之间的不同行 [英] Determining different rows between two data sets in R

查看:1034
本文介绍了确定R中两个数据集之间的不同行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个数据文件以制表符分隔的CSV格式。文件格式如下:

I have two data files in tab separated CSV format. The files are in the following format:

EP Code    EP Name    Address    Region    ...
101654    Alpha     York Street    Northwest    ...
103628    Beta    5th Avenue    South    ...

EP代码是独一无二的。我想做的是比较两个文件相对于EP代码,确定不同的行并将它们写入一个新文件。

EP codes are unique. What I want to do is to compare two files with respect to EP codes, determine the different rows and write them into a new file.

例如,file1.csv有800行,file2.csv有850行。 file2可以是一个完全包括file1加50行的文件;或者可以 file1 - 10行+ 60行。我想确定两个数据集之间的差异。我对这两行不感兴趣。

For example, file1.csv has 800 rows and file2.csv has 850 rows. file2 could be a file completely including file1 plus 50 rows; or it could be file1 - 10 rows + 60 rows. I want to determine the differences between two data sets. I'm not interested in the mutual rows.

我如何在R中做?

推荐答案

有很多方法可以做到这一点,包括 setdiff intersect c $ c>%in%函数, is.element 。只需找到相交集,并使用!:



There are many ways to do this, including setdiff, intersect, the %in% function, is.element. Just find the intersecting set and exclude it using !:

diff1 <- file1[setdiff(file1$ep.code, file2$ep.code),]

diff2 <- file2[!(intersect(file2$ep.code, file1$ep.code)),]

这篇关于确定R中两个数据集之间的不同行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆