AWK只算选择性组合: [英] awk count selective combinations only:

查看:146
本文介绍了AWK只算选择性组合:的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

想阅读和计数字段值==TRUE只能从第3场至5场。

INPUT.TXT

  Locationx,说明,A,B,C,Locationy
AB123,名1,TRUE,TRUE,TRUE,ab1234
AB123,名称2,TRUE,FALSE,TRUE,ab1234
AB123,名称2,FALSE,FALSE,TRUE,ab1234
AB123,名1,TRUE,TRUE,TRUE,ab1234
AB123,名称2,TRUE,TRUE,TRUE,ab1234
AB123,NAME3,FALSE,FALSE,FALSE,ab1234
AB123,NAME3,TRUE,FALSE,FALSE,ab1234
AB123,NAME3,TRUE,TRUE,FALSE,ab1234
AB123,NAME3,TRUE,TRUE,FALSE,ab1234
AB123,名1,TRUE,TRUE,FALSE,ab1234

在阅读从第3场头至5场,I,E,A,B,C要生成唯一的排列组合如A,B,C,AB,AC,AB,只有ABC。
注:AA,BB,CC,BA等排除

如果真实被认为是AB组合算那么它不应该被认为是Aconut&安培; B重新计数,以避免重复。

【举例】#1

  Locationx,说明,A,B,C,Locationy
AB123,名1,TRUE,TRUE,TRUE,ab1234

运算#1

 说明,A,B,C,AB,AC,BC,ABC
名1 ,,,,,,, 1

例2

  Locationx,说明,A,B,C,Locationy
AB123,名1,TRUE,TRUE,FALSE,ab1234

运算#2

 说明,A,B,C,AB,AC,BC,ABC
名1 ,,,, 1 ,,,

例3

  Locationx,说明,A,B,C,Locationy
AB123,名1,FALSE,TRUE,FALSE,ab1234

运算#3

 说明,A,B,C,AB,AC,BC,ABC
名1,1 ,,,,,

所需的输出:

 说明,A,B,C,AB,AC,BC,ABC
名1 ,,,, 1 ,,, 2
名称2 ,,, 1,1,1
Name3,1 ,,, 2 ,,,

实际文件,如下图所示:

INPUT.TXT

  Locationx,说明,呼入,呼出,短信,充值,借记卡,数据,Locationy
AB123,名1,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE,ab1234
AB123,名称2,TRUE,TRUE,FALSE,TRUE,TRUE,TRUE,ab1234
AB123,名称2,不错,不错,不错,不错,FALSE,FALSE,ab1234
AB123,NAME1,不错,不错,不错,不错,FALSE,TRUE,ab1234
AB123,名称2,不错,不错,不错,不错,FALSE,TRUE,ab1234
AB123,NAME3,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,ab1234
AB123,NAME3,不错,不错,不错,不错,不错,不错,ab1234
AB123,NAME3,TRUE,TRUE,FALSE,TRUE,FALSE,FALSE,ab1234
AB123,NAME3,TRUE,TRUE,FALSE,TRUE,FALSE,FALSE,ab1234
AB123,名1,TRUE,TRUE,FALSE,FALSE,FALSE,TRUE,ab1234

刚才试了不少,没有什么是物化,有什么建​​议请!

编辑:从实际输入所需的输出:

<$p$p><$c$c>Desc
名1 ,,,,, 1 ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, 1 ,,, 1 ,,,,,,, ,,,,,,,,,,,,,,
名称2 ,,,, 1,1 ,,,,,,,,,,,,,,, 1 ,,,,,,,,,,,,,,,,,,,,,,,,,, ,,,,,,,,,,,,,,
Name3,1 ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, 2 ,,,,,,,,, ,,,,,,,,,, 1 ,,,

不要有Perl和Python访问!


解决方案

我已经写了 perl的脚本,这是否适合你。正如你可以从大小和评论看到,这是非常简单的完成这件事。

 #!的/ usr / bin中/ perl的使用严格的;
使用警告;
使用autodie;
使用算法::组合学QW(组合);在您的文件存在文件##更改为路径
打开我的$跳频,'&LT;','文件';我(%的数据,@new_labels);##捕捉标题行中的数组
我@header =分流/,/,&LT; $ FH取代;##备份头
我@fields = @header;##除去第一,第二和最后一列
@header =剪接@header,2,-1;##生成唯一的组合
我的$ ITER(1 .. + @头){
    我的$ =组合组合(\\ @头,$ ITER);
    而(我的$一双= $ combination-&gt;接下来){
        推@new_labels,@ $对子;
    }
}##通过文件的其余部分迭代
而(我的$行=&LT; $ FH&GT;){
    我@line =分流/,/,$线;    ##识别标签相结合是真的
    我@is_true =地图{$领域[$ _]}的grep {$线[$ _] EQTRUE} 0 .. $#线;    在哈希表##递增计数器在键入说明,然后新标签
    ++ $数据{$行[1]} {$ _}在地图{S / / - /克; $ _}@is_true
}##打印新的头
打印已加入(,,说明,地图{S / / - /克; $ _}反向@new_labels)。 \\ n;##打印的说明书和计数器值
我的$ DESC(排序键%的数据){
    打印已加入(,$递减,(地图{$数据{$递减} {$ _} // =}反向@new_labels))。 \\ n;
}

输出:

<$p$p><$c$c>Desc,INCOMING-OUTGOING-SMS-RECHARGE-DEBIT-DATA,OUTGOING-SMS-RECHARGE-DEBIT-DATA,INCOMING-SMS-RECHARGE-DEBIT-DATA,INCOMING-OUTGOING-RECHARGE-DEBIT-DATA,INCOMING-OUTGOING-SMS-DEBIT-DATA,INCOMING-OUTGOING-SMS-RECHARGE-DATA,INCOMING-OUTGOING-SMS-RECHARGE-DEBIT,SMS-RECHARGE-DEBIT-DATA,OUTGOING-RECHARGE-DEBIT-DATA,OUTGOING-SMS-DEBIT-DATA,OUTGOING-SMS-RECHARGE-DATA,OUTGOING-SMS-RECHARGE-DEBIT,INCOMING-RECHARGE-DEBIT-DATA,INCOMING-SMS-DEBIT-DATA,INCOMING-SMS-RECHARGE-DATA,INCOMING-SMS-RECHARGE-DEBIT,INCOMING-OUTGOING-DEBIT-DATA,INCOMING-OUTGOING-RECHARGE-DATA,INCOMING-OUTGOING-RECHARGE-DEBIT,INCOMING-OUTGOING-SMS-DATA,INCOMING-OUTGOING-SMS-DEBIT,INCOMING-OUTGOING-SMS-RECHARGE,RECHARGE-DEBIT-DATA,SMS-DEBIT-DATA,SMS-RECHARGE-DATA,SMS-RECHARGE-DEBIT,OUTGOING-DEBIT-DATA,OUTGOING-RECHARGE-DATA,OUTGOING-RECHARGE-DEBIT,OUTGOING-SMS-DATA,OUTGOING-SMS-DEBIT,OUTGOING-SMS-RECHARGE,INCOMING-DEBIT-DATA,INCOMING-RECHARGE-DATA,INCOMING-RECHARGE-DEBIT,INCOMING-SMS-DATA,INCOMING-SMS-DEBIT,INCOMING-SMS-RECHARGE,INCOMING-OUTGOING-DATA,INCOMING-OUTGOING-DEBIT,INCOMING-OUTGOING-RECHARGE,INCOMING-OUTGOING-SMS,DEBIT-DATA,RECHARGE-DATA,RECHARGE-DEBIT,SMS-DATA,SMS-DEBIT,SMS-RECHARGE,OUTGOING-DATA,OUTGOING-DEBIT,OUTGOING-RECHARGE,OUTGOING-SMS,INCOMING-DATA,INCOMING-DEBIT,INCOMING-RECHARGE,INCOMING-SMS,INCOMING-OUTGOING,DATA,DEBIT,RECHARGE,SMS,OUTGOING,INCOMING
名1 ,,,,,, 1 ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, 1 ,,, 1 ,,,, ,,,,,,,,,,,,,,,,,
名称2 ,,,, 1,1 1 ,,,,,,,,,,,,,,,, ,,,,,,,,,,,,,,,,,,,,,,,, ,,,,,,,,,,,,,,,,,
Name3,1 ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, 2 ,,,,,, ,,,,,,,,,,,,, 1 ,,,

注意:请重新访问您预期的输出。它里面有一些错误,你可以从上面的脚本生成的输出中看到。

Would like to read and count the field value == "TRUE" only from 3rd field to 5th field.

Input.txt

Locationx,Desc,A,B,C,Locationy
ab123,Name1,TRUE,TRUE,TRUE,ab1234
ab123,Name2,TRUE,FALSE,TRUE,ab1234
ab123,Name2,FALSE,FALSE,TRUE,ab1234
ab123,Name1,TRUE,TRUE,TRUE,ab1234
ab123,Name2,TRUE,TRUE,TRUE,ab1234
ab123,Name3,FALSE,FALSE,FALSE,ab1234
ab123,Name3,TRUE,FALSE,FALSE,ab1234
ab123,Name3,TRUE,TRUE,FALSE,ab1234
ab123,Name3,TRUE,TRUE,FALSE,ab1234
ab123,Name1,TRUE,TRUE,FALSE,ab1234

While reading the headers from 3rd field to 5th field , i,e A, B, C want to generate unique combinations and permutations like A,B,C,AB,AC,AB,ABC only. Note: AA, BB, CC, BA etc excluded

If the "TRUE" is considered for "AB" combination count then it should not be considered for "A" conut & "B" count again to avoid duplicate ..

Example#1

Locationx,Desc,A,B,C,Locationy
ab123,Name1,TRUE,TRUE,TRUE,ab1234

Op#1

Desc,A,B,C,AB,AC,BC,ABC
Name1,,,,,,,1

Example#2

Locationx,Desc,A,B,C,Locationy
ab123,Name1,TRUE,TRUE,FALSE,ab1234

Op#2

Desc,A,B,C,AB,AC,BC,ABC
Name1,,,,1,,,

Example#3

Locationx,Desc,A,B,C,Locationy
ab123,Name1,FALSE,TRUE,FALSE,ab1234

Op#3

Desc,A,B,C,AB,AC,BC,ABC
Name1,,1,,,,,

Desired Output:

Desc,A,B,C,AB,AC,BC,ABC
Name1,,,,1,,,2
Name2,,,1,,1,,1
Name3,1,,,2,,,

Actual File is like below :

Input.txt

Locationx,Desc,INCOMING,OUTGOING,SMS,RECHARGE,DEBIT,DATA,Locationy
ab123,Name1,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE,ab1234
ab123,Name2,TRUE,TRUE,FALSE,TRUE,TRUE,TRUE,ab1234
ab123,Name2,TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,ab1234
ab123,Name1,TRUE,TRUE,TRUE,TRUE,FALSE,TRUE,ab1234
ab123,Name2,TRUE,TRUE,TRUE,TRUE,FALSE,TRUE,ab1234
ab123,Name3,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,ab1234
ab123,Name3,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,ab1234
ab123,Name3,TRUE,TRUE,FALSE,TRUE,FALSE,FALSE,ab1234
ab123,Name3,TRUE,TRUE,FALSE,TRUE,FALSE,FALSE,ab1234
ab123,Name1,TRUE,TRUE,FALSE,FALSE,FALSE,TRUE,ab1234

Have tried lot , nothing is materialised , any suggestions please !!!

Edit: Desired Output from Actual Input:

Desc
Name1,,,,,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1,,,1,,,,,,,,,,,,,,,,,,,,,
Name2,,,,1,1,,,,,,,,,,,,,,,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
Name3,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2,,,,,,,,,,,,,,,,,,,1,,,

Don't have Perl and Python access !!!

解决方案

I have written a perl script that does this for you. As you can see from the size and comments, it is really simple to get this done.

#!/usr/bin/perl 

use strict;
use warnings; 
use autodie;
use Algorithm::Combinatorics qw(combinations);

## change the file to the path where your file exists
open my $fh, '<', 'file';

my (%data, @new_labels);

## capture the header line in an array
my @header = split /,/, <$fh>;

## backup the header
my @fields = @header;

## remove first, second and last columns
@header = splice @header, 2, -1;

## generate unique combinations
for my $iter (1 .. +@header) {
    my $combination = combinations(\@header, $iter);
    while (my $pair = $combination->next) {
        push @new_labels, "@$pair";
    }
}

## iterate through rest of the file
while(my $line = <$fh>) {
    my @line = split /,/, $line;

    ## identify combined labels that are true
    my @is_true = map { $fields[$_] } grep { $line[$_] eq "TRUE" } 0 .. $#line;

    ## increment counter in hash map keyed at description and then new labels
    ++$data{$line[1]}{$_} for map { s/ /-/g; $_ } "@is_true";
}

## print the new header
print join ( ",", "Desc", map {s/ /-/g; $_} reverse @new_labels ) . "\n";

## print the description and counter values
for my $desc (sort keys %data){     
    print join ( ",", $desc, ( map { $data{$desc}{$_} //= "" } reverse @new_labels ) ) . "\n";
}

Output:

Desc,INCOMING-OUTGOING-SMS-RECHARGE-DEBIT-DATA,OUTGOING-SMS-RECHARGE-DEBIT-DATA,INCOMING-SMS-RECHARGE-DEBIT-DATA,INCOMING-OUTGOING-RECHARGE-DEBIT-DATA,INCOMING-OUTGOING-SMS-DEBIT-DATA,INCOMING-OUTGOING-SMS-RECHARGE-DATA,INCOMING-OUTGOING-SMS-RECHARGE-DEBIT,SMS-RECHARGE-DEBIT-DATA,OUTGOING-RECHARGE-DEBIT-DATA,OUTGOING-SMS-DEBIT-DATA,OUTGOING-SMS-RECHARGE-DATA,OUTGOING-SMS-RECHARGE-DEBIT,INCOMING-RECHARGE-DEBIT-DATA,INCOMING-SMS-DEBIT-DATA,INCOMING-SMS-RECHARGE-DATA,INCOMING-SMS-RECHARGE-DEBIT,INCOMING-OUTGOING-DEBIT-DATA,INCOMING-OUTGOING-RECHARGE-DATA,INCOMING-OUTGOING-RECHARGE-DEBIT,INCOMING-OUTGOING-SMS-DATA,INCOMING-OUTGOING-SMS-DEBIT,INCOMING-OUTGOING-SMS-RECHARGE,RECHARGE-DEBIT-DATA,SMS-DEBIT-DATA,SMS-RECHARGE-DATA,SMS-RECHARGE-DEBIT,OUTGOING-DEBIT-DATA,OUTGOING-RECHARGE-DATA,OUTGOING-RECHARGE-DEBIT,OUTGOING-SMS-DATA,OUTGOING-SMS-DEBIT,OUTGOING-SMS-RECHARGE,INCOMING-DEBIT-DATA,INCOMING-RECHARGE-DATA,INCOMING-RECHARGE-DEBIT,INCOMING-SMS-DATA,INCOMING-SMS-DEBIT,INCOMING-SMS-RECHARGE,INCOMING-OUTGOING-DATA,INCOMING-OUTGOING-DEBIT,INCOMING-OUTGOING-RECHARGE,INCOMING-OUTGOING-SMS,DEBIT-DATA,RECHARGE-DATA,RECHARGE-DEBIT,SMS-DATA,SMS-DEBIT,SMS-RECHARGE,OUTGOING-DATA,OUTGOING-DEBIT,OUTGOING-RECHARGE,OUTGOING-SMS,INCOMING-DATA,INCOMING-DEBIT,INCOMING-RECHARGE,INCOMING-SMS,INCOMING-OUTGOING,DATA,DEBIT,RECHARGE,SMS,OUTGOING,INCOMING
Name1,,,,,,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1,,,1,,,,,,,,,,,,,,,,,,,,,
Name2,,,,1,,1,,,,,,,,,,,,,,,,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
Name3,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2,,,,,,,,,,,,,,,,,,,1,,,

Note: Please revisit your expected output. It has few mistakes in it as you can see from the output generated from the script above.

这篇关于AWK只算选择性组合:的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆