过滤床文件中的重叠条目 [英] Filter overlapping entries in bed file
问题描述
我有一个像这样的床头文件:
I have a bed file that looks like this:
1 183113 183114 chr1:183113-183240 0 +
1 187286 187287 chr1:187128-187287 0 -
1 187576 187587 chr1:187375-187577 0 -
1 187580 187590 chr1:187379-187577 0 -
我的目的是仅提取条目不与任何其他条目重叠的那些行.一段时间以来,我一直在根据
My aim is to extract only those rows for which entries do not overlap with any others. For some time I have been trying bedtools merge according to the doc. I wanted to use specific flags to count the entries that constituted to each "merged" fragment and later keep only those with value "1" but here comes the problem: I don't know how to keep the information about the strand, score (this should always be 0) and name(this might be reconstructed from first 3 columns). Does anyone know how to put these things together?
输出应与输入(上)完全相同,但仅与不与其他任何行重叠的行相同.
Output should look exactly as input (above) bed but only with these rows that do not overlap with anything else.
1 183113 183114 chr1:183113-183240 0 +
1 187286 187287 chr1:187128-187287 0 -
推荐答案
好的,我已经解决了这个问题:
OK, I worked this out:
1)计算原始输入中的重叠
1) Count the overlaps in the original input
bedtools merge -i IN.bed -c 1 -o count > counted
2)仅过滤掉那些不重叠的行
2) Filter out only those rows that do not overlap with anything
awk '/\t1$/{print}' counted > filtered
3)与原始输入相交,并仅保留过滤后发现的原始行
3) Intersect it with the original input and keep only those original rows that were found after filtering as well
bedtools intersect -a IN.bed -b filtered -wa > OUT.bed
这篇关于过滤床文件中的重叠条目的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!