计数及处理在文本文件中出现的(Perl的) [英] Counting and manipulating occurrences in text file (Perl)

查看:121
本文介绍了计数及处理在文本文件中出现的(Perl的)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个制表符分隔文本文件,如

  1J大号0.5
1J P 0.4
1Jķ0.2
1J大号0.3
1Bķ0.7
1B大号0.2
1B P 0.3
1B大号0.6
1B大号0.3

和我想操纵它,以获得以下信息:

有关在第一列中的每个元件,计数多少在第二列中有重复的元素,以及用于第二列的每个元素执行所有数的平均值在第三列。所需的输出可以是另一个制表符分隔文本文件,其中平均是在第2列该元素的平均数:

  1日K#平均大号#平均市盈率平均#
1J 1 0.2 2 0.4 0.4 1
1B 1​​ 0.7 3 0.38 1 0.3

我应该如何进行?我想过做与关键=第一列阵列的哈希值,但我不认为这将是太有利。

我也想过创建一个名为多个阵列 @L @P @氏/ code>来计算这些元素的出现,为第一列中的每个元素;和其他阵列 @Ln @Pn @Kn 这将让所有的数字为每个这些。最后,每个数字的总和由标@L分为会给我的平均数。

但是我的这些主要问题是:我怎么能做到这一切的处理为第一列的每个元素

编辑:另一种可能性(即我想现在)是建立在第一列的所有独特元素的数组。然后,的grep ING每一个做处理。但也有可能是更容易的方式?

EDIT2:它可能会发生第二列的一些元素没有为在第一列的一些元素存在 - 的问题:由0.例如划分:

  1J大号0.5
1J P 0.4
1Jķ0.2
1J大号0.3
1Bķ0.7
1B大号0.2
图1B大号0.3下; - 注意,这不是p如在上面的例子。
1B大号0.6
1B大号0.3


解决方案

下面是一段路要走:

 我的$结果;
而(<数据&GT){
    终日啃食;
    我@data =拆分;
    $ result-> {$数据[0]} {$数据[1]} {总和} + = $数据[2];
    $ result-> {$数据[0]} {$数据[1]} {NBR} ++;
}
说1 \\ TK#\\ TAVG \\ TL#\\ TAVG \\ tP的#\\ TAVG
我的foreach $ K(键%$结果){
    打印$ķ\\ t的;
    我的$ C(QW(K部分L P)){
        如果(存在($ result-> {$ķ} {$ C} {} NBR)及和放大器; $ result-> {$ķ} {$ C} {} NBR!= 0){
            的printf(%d个\\ T%.2f \\ t的,$ result-> {$ķ} {$ C} {} NBR,$ result-> {$ķ} {$ C} {}总和/ $结果 - > {$ķ} {$ C} {NBR});
        }其他{
            的printf(%d个\\ T%.2f \\ t的,0,0);
        }
    }
    打印\\ n;
}__数据__
1J大号0.5
1J P 0.4
1Jķ0.2
1J大号0.3
1Bķ0.7
1B大号0.2
1B P 0.3
1B大号0.6
1B大号0.3

输出:

  1日K#平均大号#平均P#魅力
1B 1​​ 0.70 0.37 3 0.30 1
1J 1 0.20 0.40 2 0.40 1

I have a tab separated text file that is like

1J  L  0.5
1J  P  0.4
1J  K  0.2
1J  L  0.3
1B  K  0.7
1B  L  0.2
1B  P  0.3
1B  L  0.6
1B  L  0.3

And I want to manipulate it in order to get the following information:

For each element in the 1st column, count how many repeated elements in the second column there are, and do the average of all numbers in the third column for each element of the second column. The desired output can be another tab separated text file, where "Average" is the average number for that element in the 2nd column:

1st  K#  Average  L#  Average  P# Average 
1J  1  0.2  2  0.4  1  0.4
1B  1  0.7  3  0.38  1  0.3

How should I proceed? I thought about doing a Hash of Arrays with key = 1st column, but I don't think this would be too advantageous.

I also thought about creating multiple arrays named @L, @P, @K to count the occurrences of each of these elements, for each element of the 1st column; and other arrays @Ln, @Pn, @Kn that would get all numbers for each of these. In the end, the sum of each number divided by scalar @L would give me the average number.

But my main problem in these is: how can I do all of this processing for each element of the 1st column?

Edit: another possibility (that I am trying right now) is to create an array of all unique elements of the first column. Then, greping each one and do the processing. But there may be easier ways?

Edit2: it may happen that some elements of the second column do not exist for some elements in the first column - problem: division by 0. E.g.:

1J  L  0.5
1J  P  0.4
1J  K  0.2
1J  L  0.3
1B  K  0.7
1B  L  0.2
1B  L  0.3  <- note that this is not P as in the example above.  
1B  L  0.6
1B  L  0.3

解决方案

Here is a way to go:

my $result;
while(<DATA>){
    chomp;
    my @data = split;
    $result->{$data[0]}{$data[1]}{sum} += $data[2];
    $result->{$data[0]}{$data[1]}{nbr}++;
}
say "1st\tK#\tavg\tL#\tavg\tP#\tavg";
foreach my $k(keys %$result) {
    print "$k\t";
    for my $c (qw(K L P)) {
        if (exists($result->{$k}{$c}{nbr}) && $result->{$k}{$c}{nbr} != 0) {
            printf("%d\t%.2f\t",$result->{$k}{$c}{nbr},$result->{$k}{$c}{sum}/$result->{$k}{$c}{nbr});
        } else {
            printf("%d\t%.2f\t",0,0);
        }
    }
    print "\n";
}

__DATA__
1J  L  0.5
1J  P  0.4
1J  K  0.2
1J  L  0.3
1B  K  0.7
1B  L  0.2
1B  P  0.3
1B  L  0.6
1B  L  0.3

output:

1st K#  avg   L#  avg   P#  avg
1B  1   0.70  3   0.37  1   0.30    
1J  1   0.20  2   0.40  1   0.40    

这篇关于计数及处理在文本文件中出现的(Perl的)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆