FASTA文件的序列长度 [英] Sequence length of fasta file

查看：148 发布时间：2016/7/28 16:35:38 bash awk

本文介绍了FASTA文件的序列长度的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有以下的fasta文件：

I have the following fasta file:

>header1
CGCTCTCTCCATCTCTCTACCCTCTCCCTCTCTCTCGGATAGCTAGCTCTTCTTCCTCCT
TCCTCCGTTTGGATCAGACGAGAGGGTATGTAGTGGTGCACCACGAGTTGGTGAAGC
>header2
GGT
>header3
TTATGAT

弥所需的输出：

>header1
117
>header2
3
>header3
7
# 3 sequences, total length 127.

这是我的code：

awk '/^>/ {print; next; } { seqlen = length($0); print seqlen}' file.fa

输出我有这个code是：

The output I have with this code is:

>header1
60
57
>header2
3
>header3
7

So..I需要一个小的修改，以处理多个序列行...
同时，我需要一种方法来拥有总序列和总长度。任何建议将受到欢迎......在bash或AWK吧。我知道这是很容易做到这一点在Perl / bioperl的，实际上我有一个脚本做那些方面。

So..I need a "little" modification in order to deal with multiple sequence lines... And also I need a way to have the total sequences and total length. Any suggestion will be welcome...in bash or awk please. I know that is easy to do it in perl/bioperl and actually I have a script to do it in those ways.

感谢

推荐答案

这是 AWK / GAWK 办法可以解决由三个阶段组成：

An awk / gawk solution can be composed by three stages:

每次头找到这些行动应该执行：

Every time header is found these actions should be performed:

打印previous seqlen 如果存在

打印标记。

初始化 seqlen 的

Print previous seqlen if exists.
Print tag.
Initialize seqlen.

评论code：

awk '/^>/ { # header pattern detected
        if (seqlen){
         # print previous seqlen if exists 
         print seqlen
         }

         # pring the tag 
         print

         # initialize sequence
         seqlen = 0

         # skip further processing
         next
      }

# accumulate sequence length
{
seqlen += length($0)
}
# remnant seqlen if exists
END{if(seqlen){print seqlen}}' file.fa

A oneliner 的：

awk '/^>/ {if (seqlen){print seqlen}; print ;seqlen=0;next; } { seqlen += length($0)}END{print seqlen}' file.fa

有关总计：

awk '/^>/ { if (seqlen) {
              print seqlen
              }
            print

            seqtotal+=seqlen
            seqlen=0
            seq+=1
            next
            }
    {
    seqlen += length($0)
    }     
    END{print seqlen
        print seq" sequences, total length " seqtotal+seqle
    }' file.fa

这篇关于FASTA文件的序列长度的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

FASTA文件的序列长度 [英] Sequence length of fasta file

问题描述

推荐答案

相关文章

Linux/Unix最新文章

热门教程

热门工具

登录关闭

FASTA文件的序列长度 [英] Sequence length of fasta file

问题描述

推荐答案

相关文章

Linux/Unix最新文章

热门教程

热门工具

登录 关闭

登录关闭