解析具有多列的文本文件 [英] Parsing a text file with multiple columns

查看:87
本文介绍了解析具有多列的文本文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试提取以下文件中的11列中的每列:

I am attempting to extract each of the 11 columns in the following file:

http://bioinfo.mc.vanderbilt.edu/TSGene/Human_716_TSGs.txt

...进入入门级大学生物信息学项目的标量列表.我的努力很有效,但并非十全十美,因为各列之间的空白量各不相同(请参见文件顶部以获取详细信息).

...into a list of scalars for a beginning level college bioinformatics project. My effort, please see below, is effective but not perfect since the amount of whitespace varies between columns (please see the top of the file for details).

use strict;
use warnings;

open FH, '<', 'tsg.txt' or die $!;
my $data = do {local $/; <FH>};
close FH or die $!;

my($id, $sym, $alias, $xref, $chromo, $band, $name, $gene_t, $desc, $nuc_seq,
   $pro_seq) = $data =~ /(\S+)\s+
                         (\S+)\s+
                         (\S+)\s+
                         (\S+)\s+
                         (\S+)\s+
                         (\S+)\s+

                         (\S+)\s+
                         /xms;

print "GeneID: $id", "\n";
print "Gene_symbol: $sym", "\n";
print "Alias: $alias", "\n";
print "XRef: $xref", "\n";
print "Chromosome: $chromo", "\n";
print "Cytoband: $band", "\n";

print "Full_name: $name", "\n";
#print "Gene_type: $gene_t", "\n";
#print "Description: $desc", "\n";
#print "Nucleotide_sequence: $nuc_seq", "\n";
#print "Protein_sequence: $pro_seq", "\n";

感谢您的帮助.

推荐答案

此文件看起来像tab分开的,您应该能够在\t上使用split将每一行存储到数组中:

This file looks like its tab separated, you should be able to store each line into an array using split on \t:

my @columns = split( "\t", $data );

然后您可以通过建立索引来访问列:

And then you can access your columns by indexing in:

my $id = $columns[0];

这篇关于解析具有多列的文本文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆