使用.fasta文件计算序列的相对内容 [英] Using a .fasta file to compute relative content of sequences

查看:82
本文介绍了使用.fasta文件计算序列的相对内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我是我的菜鸟",最近才通过Perl被引入编程领域,但我仍然对所有这些都已经习惯了.我有一个必须使用的.fasta文件,尽管不确定我是否能够打开它,或者不确定是否必须盲目"使用它.

无论如何,我拥有的文件包含以.fasta格式编写的三个基因的DNA序列.

显然是这样的:

>label
sequence
>label
sequence
>label
sequence

我的目标是编写一个脚本来打开和读取文件,这是我现在所掌握的,但是我必须读取每个序列,计算每个序列中'G'和'C'的相对数量,并然后将基因名称以及它们各自的'G'和'C'内容写到TAB分隔文件中.

任何人都可以提供一些指导吗?我不确定TAB分隔的文件是什么,并且我仍在尝试找出如何打开.fasta文件以实际查看内容的方法.到目前为止,我已经处理过可以轻松打开的.txt文件,但是无法打开.fasta.

我很抱歉听起来很困惑.多谢您的耐心配合.我不像你这样的专业人士!

解决方案

我建议您检查以下链接:

fasta perl on stackoverflow

BioPerl HowTo

在perl和dna中崩溃了我们

So me being the 'noob' that I am, being introduced to programming via Perl just recently, I'm still getting used to all of this. I have a .fasta file which I have to use, although I'm unsure if I'm able to open it, or if I have to work with it 'blindly', so to speak.

Anyway, the file that I have contains DNA sequences for three genes, written in this .fasta format.

Apparently it's something like this:

>label
sequence
>label
sequence
>label
sequence

My goal is to write a script to open and read the file, which I have gotten the hang of now, but I have to read each sequence, compute relative amounts of 'G' and 'C' within each sequence, and then I'm to write it to a TAB-delimited file the names of the genes, and their respective 'G' and 'C' content.

Would anyone be able to provide some guidance? I'm unsure what a TAB-delimited file is, and I'm still trying to figure out how to open a .fasta file to actually see the content. So far I've worked with .txt files which I can easily open, but not .fasta.

I apologise for sounding completely bewildered. I'd appreciate your patience. I'm not like you pros out there!!

解决方案

I advice you check links below:

fasta perl on stackoverflow

BioPerl HowTo

A crash ourse in perl and dna

这篇关于使用.fasta文件计算序列的相对内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆