过滤床文件中的重叠条目 [英] Filter overlapping entries in bed file

查看：65 发布时间：2020/9/21 3:16:09 bash shell bioinformatics genome

本文介绍了过滤床文件中的重叠条目的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个像这样的床头文件:

I have a bed file that looks like this:

1   183113  183114  chr1:183113-183240  0   +
1   187286  187287  chr1:187128-187287  0   -
1   187576  187587  chr1:187375-187577  0   -
1   187580  187590  chr1:187379-187577  0   -

我的目的是仅提取条目不与任何其他条目重叠的那些行.一段时间以来，我一直在根据

My aim is to extract only those rows for which entries do not overlap with any others. For some time I have been trying bedtools merge according to the doc. I wanted to use specific flags to count the entries that constituted to each "merged" fragment and later keep only those with value "1" but here comes the problem: I don't know how to keep the information about the strand, score (this should always be 0) and name(this might be reconstructed from first 3 columns). Does anyone know how to put these things together?

输出应与输入(上)完全相同，但仅与不与其他任何行重叠的行相同.

Output should look exactly as input (above) bed but only with these rows that do not overlap with anything else.

1   183113  183114  chr1:183113-183240  0   +
1   187286  187287  chr1:187128-187287  0   -

推荐答案

好的，我已经解决了这个问题:

OK, I worked this out:

1)计算原始输入中的重叠

1) Count the overlaps in the original input

bedtools merge -i IN.bed -c 1 -o count > counted

2)仅过滤掉那些不重叠的行

2) Filter out only those rows that do not overlap with anything

awk '/\t1$/{print}' counted > filtered

3)与原始输入相交，并仅保留过滤后发现的原始行

3) Intersect it with the original input and keep only those original rows that were found after filtering as well

bedtools intersect -a IN.bed -b filtered -wa > OUT.bed

这篇关于过滤床文件中的重叠条目的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

过滤床文件中的重叠条目 [英] Filter overlapping entries in bed file

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

过滤床文件中的重叠条目 [英] Filter overlapping entries in bed file

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭