根据python中两个文件的列坐标合并文件 [英] merging files based on column coordinates of two files in python

查看：418 发布时间：2020/5/9 0:51:33 python merge pandas

本文介绍了根据python中两个文件的列坐标合并文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个名为snp.txt的文件，如下所示:

I have a file called snp.txt that looks like this:

chrom   chromStart  chromEnd    name    strand     observed     
chr1    259         260      rs72477211  +   A/G    single  
chr1    433         433      rs56289060  +   -/C    insertion   
chr1    491         492      rs55998931  +   C/T    single  
chr1    518         519      rs62636508  +   C/G    single  
chr1    582         583      rs58108140  +   A/G    single

我还有第二个文件gene.txt

I have a second file gene.txt

chrom   chromStart  chromEnd    tf_title    tf_score
chr1    200         270         NFKB1       123
chr1    420         440         IRF4        234
chr1    488         550         BCL3        231
chr1    513         579         TCF12       12
chr1    582         583         BAD170      89

我想要的最终输出是:output.txt

The final output I want is: output.txt

chrom   chromStart  chromEnd    name    strand  observed    tf_title    tf_score
chr1    259         260      rs72477211    +    A/G         NFKB1       123
chr1    433         433      rs56289060    +    -/C         IRF4        234
chr1    491         492      rs55998931    +    C/T         BCL3        231
chr1    518         519      rs62636508    +    C/G         TCF12       12
chr1    582         583      rs58108140    +    A/G         BAD170      89

我想做的关键是查看gene.txt，并检查snp.txt名称栏中的rsnumber是否在由chrom，chromStart和chromEnd建立的同一区域中.

The key thing I want to be able to do is to look at gene.txt and check if the rsnumber in the name column of snp.txt is in the same region established by chrom, chromStart and chromEnd.

例如:

snp.txt的第一行 rsid rs72477211在chr1上的位置259和260之间.

In the first row of snp.txt the rsid rs72477211 is on chr1 between positions 259 and 260.

现在在gene.txt中，NFKB1也在chr1上，但在200和270之间，这意味着rsid rs72477211位于NFKB1区域，因此在输出txt中会注明.

Now in gene.txt, NFKB1 is also on chr1 but between positions 200 and 270, this means that rsid rs72477211 is located the NFKB1 region, so this is noted in output txt.

在使用pandas合并功能时，我无法做到这一点，而且我不确定从哪里开始. 文件非常大，因此循环效率极低. 有人可以帮忙吗?谢谢！

I am unable to do this in using pandas merge function and I'm not sure where to even start. the files are extremely large so a loop would be highly inefficient. Can someone please help? Thanks!

根据python中两个文件的列坐标合并文件 [英] merging files based on column coordinates of two files in python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

根据python中两个文件的列坐标合并文件 [英] merging files based on column coordinates of two files in python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭