转换FASTQ与SED / AWK FASTA [英] Converting FASTQ to FASTA with SED/AWK

查看:685
本文介绍了转换FASTQ与SED / AWK FASTA的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我具有这样的数据总是来自于四个块
按以下格式(称为FASTQ):

I have a data in that always comes in block of four in the following format (called FASTQ):

@SRR018006.2016 GA2:6:1:20:650 length=36
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGN
+SRR018006.2016 GA2:6:1:20:650 length=36
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!+!
@SRR018006.19405469 GA2:6:100:1793:611 length=36
ACCCGCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
+SRR018006.19405469 GA2:6:100:1793:611 length=36
7);;).;);;/;*.2>/@@7;@77<..;)58)5/>/

有没有一个简单的sed / awk的/ bash的方式将它们转换成
这种格式(称为FASTA):

Is there a simple sed/awk/bash way to convert them into this format (called FASTA):

>SRR018006.2016 GA2:6:1:20:650 length=36
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGN
>SRR018006.19405469 GA2:6:100:1793:611 length=36
ACCCGCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC

在原则上,我们要提取的前两行的每个块的-4-
和替换 @ 方式&gt;

In principle we want to extract the first two lines in each block-of-4 and replace @ with >.

推荐答案

这是一个老问题,也有过许多提供了不同的解决方案。由于接受的答案使用SED但有一个突出的问题(这是它会与>当@符号出现质量行的第一个字母代替@),我觉得有必要提供一个简单的sed式的解决方案,实际工作

This is an old question, and there have been many different solutions offered. Since the accepted answer uses sed but has a glaring problem (which is that it will replace @ with > when the @ sign appears as the first letter of the quality line), I feel compelled to offer a simple sed-based solution that actually works:

sed -n '1~4s/^@/>/p;2~4p' 

做的唯一的假设是,每一次读中占有FASTQ文件中恰好有4行,但似乎pretty安全的,在我的经验。

The only assumption made is that each read occupies exactly 4 lines in the FASTQ file, but that seems pretty safe, in my experience.

在fastx工具包中的fastq_to_fasta脚本也适用。 (值得一提,你需要指定-Q33选项,以适应现在常见的PHRED + 33 QUAL编码。这很有趣,因为它反正扔掉质量数据!)

The fastq_to_fasta script in the fastx toolkit also works. (It's worth mentioning that you need to specify the -Q33 option to accommodate the now common Phred+33 qual encodings. Which is funny, since it's throwing away the quality data anyway!)

这篇关于转换FASTQ与SED / AWK FASTA的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆