如何从制表符分隔的数据文件中平均列值,忽略标题行和左列? [英] How do I average column values from a tab-separated data file, ignoring a header row and the left column?

查看:57
本文介绍了如何从制表符分隔的数据文件中平均列值,忽略标题行和左列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的任务是从以下标题为 Lab1_table.txt 的数据文件中计算平均值:

My task is to compute averages from the following data file, titled Lab1_table.txt:

retrovirus      genome  gag     pol     env
HIV-1           9181    1503    3006    2571
FIV             9474    1353    2993    2571
KoRV            8431    1566    3384    1980
GaLV            8088    1563    3498    2058
PERV            8072    1560    3621    1532

我必须编写一个脚本来打开和读取这个文件,通过将内容拆分成一个数组来读取每一行,然后计算数值的平均值(genomegagpolenv),并将上述每一列的平均值写入一个新文件.

I have to write a script that will open and read this file, read each line by splitting the contents into an array and computer the average of the numerical values (genome, gag, pol, env), and write to a new file the average from each of the aforementioned columns.

我一直在尽力弄清楚如何不考虑第一行或第一列,但是每次我尝试在命令行上执行时,我总是想出显式包名称"错误.

I've been trying my best to figure out how to not take into account the first row, or the first column, but every time I try to execute on the command line I keep coming up with 'explicit package name' errors.

Global symbol @average requires explicit package name at line 23.
Global symbol @average requires explicit package name at line 29.
Execution aborted due to compilation errors.

我知道这涉及 @$,但即使知道我无法更改错误.

I understand that this involves @ and $, but even knowing that I've not been able to change the errors.

这是我的代码,但我要强调的是,我是上周刚开始的初学者:

This is my code, but I emphasise that I'm a beginner having started this just last week:

#!/usr/bin/perl -w
use strict;

my $infile = "Lab1_table.txt"; # This is the file path
open INFILE, $infile or die "Can't open $infile: $!";

my $count = 0;
my $average = ();

while (<INFILE>) {
    chomp;
    my @columns = split /\t/;
    $count++;
    if ( $count == 1 ) {
        $average = @columns;
    }
    else {
        for( my $i = 1; $i < scalar $average; $i++ )  {
            $average[$i] += $columns[$i];
        }
    }
}

for( my $i = 1; $i < scalar $average; $i++ ) {
    print $average[$i]/$count, "\n";
}

如果有任何见解,我将不胜感激,如果合适的话,我也非常感谢您通过列出您在每一步所做的工作来让我知道.我想学习,如果我能够阅读某人的流程,对我来说会更有意义.

I'd appreciate any insight, and I would also great appreciate letting me know by list numbering what you're doing at each step - if appropriate. I'd like to learn and it would make more sense to me if I was able to read through what someone's process was.

推荐答案

这里是你需要改变的地方
为标题使用另一个变量

Here are the points you need to change
Use another variable for the headers

my $count = 0;
my @header = ();
my @average = ();

然后改变if语句里面的逻辑

then change the logic inside if statement

if ( $count == 1 ) {
    @header = @columns;
}

现在不要使用 @average 作为限制,使用 $i <用于 else 语句的标量 @columns.最初 @average 为零,你永远不会进入 for 循环.

Now don't use the @average for the limit, use $i < scalar @columns for else statement. Initially @average is zero, you will never get inside the for loop ever.

else {
    for( my $i = 1; $i < scalar @columns; $i++ )  {
        $average[$i] += $columns[$i];
    }
}

最后将 -1 添加到您的计数器中.记住在解析标题时增加计数器

Finally add -1 to your counter. Remember you increment your counter when you parse your header

for( my $i = 1; $i < scalar @average; $i++ ) {
    print $average[$i]/($count-1), "\n";
}

这是最终的代码
可以利用@header来整齐的展示结果

Here is the final code
You can take advantage of @header to display the result neatly

#!/usr/bin/perl -w

use strict;

my $infile = "Lab1_table.txt"; # This is the file path
open INFILE, $infile or die "Can't open $infile: $!"; 

my $count = 0;
my @header = ();
my @average = ();

while (<INFILE>) {
    chomp;


    my @columns = split /\t/;
    $count++;
    if ( $count == 1 ) {
        @header = @columns;
    }
    else {
        for( my $i = 1; $i < scalar @columns; $i++ )  {
            $average[$i] += $columns[$i];
        }
    }
} 

for( my $i = 1; $i < scalar @average; $i++ ) {
    print $average[$i]/($count-1), "\n";
}

还有其他方法可以编写此代码,但我认为最好更正您的代码,以便您可以轻松了解您的代码有什么问题.希望有帮助

There are other ways to write this code but I thought it would be better to just correct your code so that you can easily understand what is wrong with your code. Hope it helps

这篇关于如何从制表符分隔的数据文件中平均列值,忽略标题行和左列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆