如何比较数据帧1的每一行与数据帧2的每一行？ [英] How to compare each row of data frame 1 with each row of data frame 2?

查看：98 发布时间：2017/3/12 12:04:20 r dataframe data.table

本文介绍了如何比较数据帧1的每一行与数据帧2的每一行？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有两个数据框，如下所示：

I have two data frames that look like this:

x=data.frame(Name=c("200003","200260","400826","400863","500710"),Chr=c("chr1","chr1","chr2","chr3","chr3"),Position=c(11880,14415,13000,15000,18000))    
y=data.frame(name=c("geneA","geneB","geneC","geneD","geneE"),chrom=c("chr1","chr1","chr2","chr2","chr3"),Start=c(11873,11878,12000,14361,14361),End=c(14409,14419,14409,16765,19759))

> x
    Name  Chr Position
1 200003 chr1    11880
2 200260 chr1    14415
3 400826 chr2    13000
4 400863 chr3    15000
5 500710 chr3    18000

> y
   name chrom   Start   End
1 geneA  chr1   11873 14409
2 geneB  chr1   11878 14419
3 geneC  chr2   12000 14409
4 geneD  chr2   14361 16765
5 geneE  chr3   14361 19759

我想比较x和y，并返回一个数据帧或列表在x中的每个名称以及与Chr和（开始，结束）间隔具有相同的chrom的y的名称包括位置。例如，

I would like to compare x and y, and return a dataframe or list consisting of each Name in x and the names of y that has the same chrom as Chr and the (Start,End) interval includes the Position. For example,

200003  geneA
200003  geneB
200260  geneB
400826  geneC
400863  geneE
500710  geneE

编辑：我可以使用以下方法获得结果

I was able to get the result using the following code

z=merge(x,y,by.x='Chr',by.y='chrom')
z=cbind(z,with(z, Position>=Start & Position<=End))
z=z[-which(z[,7]=="FALSE"),]
output=cbind(as.character(z$Name),as.character(z$name))

实际上，x和y以及大型数据集，需要一段时间才能运行 merge 。有更好的方法吗？

In reality x and y and large datasets and it takes a while for merge to run. Is there a better way to do this?

推荐答案

@BondedDust似乎已经删除了他的解决方案。他的解决方案的唯一问题是关键还需要包括 chrom 。

@BondedDust seems to have removed his solution. The only issue with his solution is that the key needs to also include chrom.

这是使用 data.table 中的 foverlaps 。首先我们将data.frames转换为data.tables：

Here's using foverlaps from data.table. First we'll convert the data.frames to data.tables:

require(data.table)
setDT(x)
setDT(y)

$ b 适用于区间范围，我们将为 x 添加一个虚拟列，如下所示：

Then, since foverlaps works with interval ranges, we'll add a dummy column for x as follows:

x[, Position2 := Position]

每个 x ，我们想知道 Chr，Position，Position2 是否全部 >任何 y 的 chrome，开始，结束。我们将使用 y 作为key，如下所示：

Now, for each x, we'd like to know if Chr, Position, Position2 falls entire within any y's chrome,Start,End. We'll use y as "key" as follows:

setkey(y, chrom, Start, End) foverlaps(x, y, by.x=c("Chr", "Position", "Position2"))[, list(Name, name)] # Name name # 1: 200003 geneA # 2: 200003 geneB # 3: 200260 geneB # 4: 400826 geneC # 5: 400863 geneE # 6: 500710 geneE

data.frames中的列异常命名和套用 - chrom对Chr。使用一致的名称可能更容易。

The columns in your data.frames are unusually named and cased - "chrom" vs "Chr". It might be easier to work with consistent names.

这篇关于如何比较数据帧1的每一行与数据帧2的每一行？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何比较数据帧1的每一行与数据帧2的每一行？ [英] How to compare each row of data frame 1 with each row of data frame 2?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何比较数据帧1的每一行与数据帧2的每一行？ [英] How to compare each row of data frame 1 with each row of data frame 2?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭