在100 -200之间选择碱基,并与标题一起打印 [英] Select bases between 100 -200 and printing them along with header
问题描述
我有一个multi fasta文件,我需要从中提取100-200范围内的碱基,包括其相应的标头.我知道'cut -c 100-200'可以做到而无需使用它们相应的标题.有什么方法可以在Perl或bash中实现?
I have a multi fasta file, from which I need to extract the bases ranging 100-200, including their corresponding headers. I know that 'cut -c 100-200' can do it without having their corresponding headers. Is there any way to do this in Perl or bash ??
示例文件:
8YS68_00009_00025GAGTTTGATCCTGGCTCAGAGCGAACGCTGGCGGCAGGCTTAACACATGCAAGTCGAGCGGGCGTAGCAATACGTCAGCGGCAGACGGGTGAGTAACGCGTGGGAACATACCTTTTGGTTCGGAACAACACAGGGAAACTTGTGCTAATACCGGATAAGCTACGGGAAGATT8YS68_00009_00027GAGTTTGATCATGGCTCAGAGCGAACGCTGGCGGCAGGCCTAACC8YS68_00012_00035Google
8YS68_00009_00025 GAGTTTGATCCTGGCTCAGAGCGAACGCTGGCGGCAGGCTTAACACATGCAAGTCGAGCGGGCGTAGCAATACGTCAGCGGCAGACGGGTGAGTAACGCGTGGGAACATACCTTTTGGTTCGGAACAACACAGGGAAACTTGTGCTAATACCGGATAAGCTACGGGAAGATT 8YS68_00009_00027 GAGTTTGATCATGGCTCAGAGCGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAGCGCCGTAGCAATACGGAGCGGCAGACGGGTGAGTAACGCGTGGGAACGTACCTTTCGGTTCGGAATAACTCAGGGAAACTTGAGCTAATACCGAATACGTCCGTAAGGAGAAAGATTTATCGCCGAAAGATCGGCCCGCGTAAGATTAGCTAGTTGGTGAGGTAAGGCTCACCAAGCGACGATCGTTAGCTTGTC 8YS68_00012_00035 GAGTTTGATCATGGCTCAGAACGAACGTTGGCGGCGTGGATTAGGCATGCAAGTCGAACGAATCCCATCTGGGTAACTGGGTGGGGGAAGTGGCGAAAGGGGCAGTAATGCGTGGGTAACCTACCTGGGGACCGGGATAGCCTCCTAACGGATGGGTAATACCGGATACGACCTTCGGAGGCATCTCCTGAAGG
所需的输出:序列号------ ATCGATCGATCG -----
Desired output: seq id ------ATCGATCGATCG-----
序列号------ ATCGATCGATCG -----
seq id ------ATCGATCGATCG-----
序列号------ ATCGATCGATCG -----
seq id ------ATCGATCGATCG-----
这意味着,我想准确提取每个序列100-200之间的碱基及其标题.如果序列短于100 bp,请忽略它.
Which means, I want to exactly extract the bases between 100-200 of each sequences, along with their headers. If a sequence is shorter than 100 bp, then ignore it.
推荐答案
在查看了建议并解决了一段时间后,我在Perl中找到了解决方案.这是我编写的在Perl中起作用的重要循环".
After reviewing the suggestions and working for sometime with this problem, I found a solution in Perl. Here is the important "loop" which does the job in Perl, which I wrote.
my $seq = '';
my $head ;
while (my $seq = <IN>) {
if ($seq =~ m/^>/){
$head = $seq;
}
else{
my $dna .=$seq;
my $subseq = substr ($seq, 100, 100);
my $size = length($subseq);
if ($size > 99){
print OUT "$head";
print OUT "$subseq";
}
}
}
谢谢大家的帮助和支持.
Thank you all for the help and support.
这篇关于在100 -200之间选择碱基,并与标题一起打印的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!