使用.fasta文件计算序列的相对内容 [英] Using a .fasta file to compute relative content of sequences
问题描述
所以我是我的菜鸟",最近才通过Perl被引入编程领域,但我仍然对所有这些都已经习惯了.我有一个必须使用的.fasta文件,尽管不确定我是否能够打开它,或者不确定是否必须盲目"使用它.
无论如何,我拥有的文件包含以.fasta格式编写的三个基因的DNA序列.
显然是这样的:
>label
sequence
>label
sequence
>label
sequence
我的目标是编写一个脚本来打开和读取文件,这是我现在所掌握的,但是我必须读取每个序列,计算每个序列中'G'和'C'的相对数量,并然后将基因名称以及它们各自的'G'和'C'内容写到TAB分隔文件中.
任何人都可以提供一些指导吗?我不确定TAB分隔的文件是什么,并且我仍在尝试找出如何打开.fasta文件以实际查看内容的方法.到目前为止,我已经处理过可以轻松打开的.txt文件,但是无法打开.fasta.
我很抱歉听起来很困惑.多谢您的耐心配合.我不像你这样的专业人士!
我建议您检查以下链接:
So me being the 'noob' that I am, being introduced to programming via Perl just recently, I'm still getting used to all of this. I have a .fasta file which I have to use, although I'm unsure if I'm able to open it, or if I have to work with it 'blindly', so to speak.
Anyway, the file that I have contains DNA sequences for three genes, written in this .fasta format.
Apparently it's something like this:
>label
sequence
>label
sequence
>label
sequence
My goal is to write a script to open and read the file, which I have gotten the hang of now, but I have to read each sequence, compute relative amounts of 'G' and 'C' within each sequence, and then I'm to write it to a TAB-delimited file the names of the genes, and their respective 'G' and 'C' content.
Would anyone be able to provide some guidance? I'm unsure what a TAB-delimited file is, and I'm still trying to figure out how to open a .fasta file to actually see the content. So far I've worked with .txt files which I can easily open, but not .fasta.
I apologise for sounding completely bewildered. I'd appreciate your patience. I'm not like you pros out there!!
I advice you check links below:
这篇关于使用.fasta文件计算序列的相对内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!