过滤床文件中的重叠条目 [英] Filter overlapping entries in bed file

查看:65
本文介绍了过滤床文件中的重叠条目的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个像这样的床头文件:

I have a bed file that looks like this:

1   183113  183114  chr1:183113-183240  0   +
1   187286  187287  chr1:187128-187287  0   -
1   187576  187587  chr1:187375-187577  0   -
1   187580  187590  chr1:187379-187577  0   -

我的目的是仅提取条目不与任何其他条目重叠的那些行.一段时间以来,我一直在根据

My aim is to extract only those rows for which entries do not overlap with any others. For some time I have been trying bedtools merge according to the doc. I wanted to use specific flags to count the entries that constituted to each "merged" fragment and later keep only those with value "1" but here comes the problem: I don't know how to keep the information about the strand, score (this should always be 0) and name(this might be reconstructed from first 3 columns). Does anyone know how to put these things together?

输出应与输入(上)完全相同,但仅与不与其他任何行重叠的行相同.

Output should look exactly as input (above) bed but only with these rows that do not overlap with anything else.

1   183113  183114  chr1:183113-183240  0   +
1   187286  187287  chr1:187128-187287  0   -

推荐答案

好的,我已经解决了这个问题:

OK, I worked this out:

1)计算原始输入中的重叠

1) Count the overlaps in the original input

bedtools merge -i IN.bed -c 1 -o count > counted

2)仅过滤掉那些不重叠的行

2) Filter out only those rows that do not overlap with anything

awk '/\t1$/{print}' counted > filtered

3)与原始输入相交,并仅保留过滤后发现的原始行

3) Intersect it with the original input and keep only those original rows that were found after filtering as well

bedtools intersect -a IN.bed -b filtered -wa > OUT.bed

这篇关于过滤床文件中的重叠条目的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆