基于重复键合并CSV行,并使用Perl Text :: CSV合并唯一值 [英] Merge CSV rows based on duplicate key and combine unique values using Perl Text::CSV

查看:99
本文介绍了基于重复键合并CSV行,并使用Perl Text :: CSV合并唯一值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大的制表符分隔的文件,具有重复的产品,但颜色和数量不同。我试图合并基于键的数据,以便我最终得到一个产品,组合的颜色和数量由分隔符(在这种情况下,逗号)分隔。

I have a big tab-separated file with duplicate products but with different colours and amounts. I’m trying to merge the data based on the key so that I end up with one product and the combined colours and amounts separated by a delimiter (comma in this case).

我使用 Text :: CSV 模块,以便我有更好的控制,并且因为它允许我输出具有不同分隔符(从分号到管道)的文件。

I'm using the Text::CSV module so that I have better control, and because it allows me to output the file with a different delimiters (from semicolon to pipe).

我的问题是,如何正确合并数据?我不想只是简单地结合颜色和数量,但删除重复的值。所以我想用 Id / 金额 Id / 颜色。但 Id 不是唯一的,那么我该如何做呢?

My question is, how do I merge the data properly? I don't want it simply to combine colours and amounts but remove duplicate values as well. So I was thinking a key/value with the Id/Amount and Id/Colour. But Id isn't unique so how do I do this? Do I create an array or use hashes?

这里是一些示例源数据,其中的选项卡分隔符由分号替换; 。请注意,标记的行没有Color,因此在结果中不会合并空值。

Here is some sample source data, with the tab separators replaced by semicolons ;. Note that the marked row has no Colour so the empty value is not combined in the result.

Cat_id;Cat_name;Id;Name;Amount;Colour;Bla;    
101;Fruits;50020;Strawberry;500;Red;1;
101;Fruits;50020;Strawberry;1000;Red;1;
201;Vegetables;60090;Tomato;50;Green;1;
201;Vegetables;60080;Onion;1;Purple;1;
201;Vegetables;60090;Tomato;100;Red;1;
201;Vegetables;60010;Carrot;100;Purple;1;
201;Vegetables;60050;Broccoli;500;Green;1;
201;Vegetables;60050;Broccoli;1000;Green;1;
201;Vegetables;60090;Tomato;500;Yellow;1;
101;Fruits;50060;Apple;500;Green;1;
101;Fruits;50010;Grape;500;Red;1;
201;Vegetables;60010;Carrot;500;White;1;
201;Vegetables;60050;Broccoli;2000;Green;1;
201;Vegetables;60090;Tomato;1000;Red;1;
101;Fruits;50020;Strawberry;100;Red;1;
101;Fruits;50060;Apple;1000;Red;1;
201;Vegetables;60010;Carrot;250;Yellow;1;
101;Fruits;50010;Grape;100;White;1;
101;Fruits;50030;Banana;500;Yellow;1;
201;Vegetables;60010;Carrot;1000;Yellow;1;
101;Fruits;50030;Banana;1000;Green;1;
101;Fruits;50020;Strawberry;200;Red;1;
101;Fruits;50010;Grape;200;White;1;
201;Vegetables;60010;Carrot;50;Orange;1;
201;Vegetables;60080;Onion;2;White;1;

我想要的结果:

101;Fruits;50010;Grape;100,500,200;Red,White;1;
201;Vegetables;60090;Tomato;50,500,1000,10;Yellow,Green,Red;1;
101;Fruits;50060;Apple;500,1000;Red,Green;1;    
201;Vegetables;60010;Carrot;250,50,500,1000,100;Orange,Yellow,White,Purple;1;
201;Vegetables;60050;Broccoli;1000,500,2000;Green;1;
101;Fruits;50020;Strawberry;100,1000,200,500;Red;1;
101;Fruits;50030;Banana;500,1000;Yellow,Green;1;
201;Vegetables;60080;Onion;2,1;White,Purple;1;

这是我的脚本。它没有完成(而不是工作),因为我不知道如何继续。我不认为这可以正常工作,因为我试图使用相同的键为不同的颜色。

This is my script so far. It's not finished (and not working) because I'm not sure how to continue. I don't think this can work right because I'm trying to use the same key for different colours.

use strict;
use warnings;
use Text::CSV;
use List::MoreUtils 'uniq';

my $inputfile  = shift || die "Give input and output names!\n";
my $outputfile = shift || die "Give output name!\n";

open my $infile,  '<', $inputfile   or die "Sourcefile in use / not found :$!\n";
open my $outfile, '>', $outputfile  or die "Outputfile in use :$!\n";

binmode($outfile, ":encoding(utf8)");

my $csv_in  = Text::CSV->new({binary => 1,sep_char => ";",eol => $/});
my $csv_out = Text::CSV->new({binary => 1,sep_char => "|",always_quote => 1,eol => $/}); #,quote_null => 0 #

my %data;

while (my $elements = $csv_in->getline($infile)){
    my $id = $elements->[2];
    push @{ $data{$id} }, \@elements;       
    }

for my $id ( sort keys %data ){
    my $set = $data{$id};
    my @elements = @{ $set->[0] };
    $elements[4] = join ',', uniq map { $_->[4] } @$set;
    $elements[5] = join ',', uniq map { $_->[5] } @$set;
    $csv_in->combine(@$elements);
    $csv_out->print($outfile, $elements);
    }   

编辑:我使用data :: dumper进行测试

推荐答案

我没有时间写正确的评论,这个程序似乎做你所需要的。它使用 <$ c $的 uniq 函数c> List :: MoreUtils 模块。它不是一个核心模块,因此可能需要安装。我相信在合并字段中显示的金额和颜色是什么顺序?

I don't have time to write a proper commentary, but this program seems to do what you need. It uses the uniq function from the List::MoreUtils modules. It isn't a core module and so may need installing. I trust that it's not important what order the Amounts and Colours appear in the combined fields?

use strict;
use warnings;

use List::MoreUtils 'uniq';

print scalar <DATA>;

my %data;
while (<DATA>) {
  chomp;
  my @fields = split /;/;
  my $id = $fields[2];
  push @{ $data{$id} }, \@fields;
}

for my $id ( sort keys %data ) {
  my $set = $data{$id};
  my @fields = @{ $set->[0] };
  $fields[4] = join ',', uniq map { $_->[4] } @$set;
  $fields[5] = join ',', uniq map { $_->[5] } @$set;
  print join(';', @fields, ''), "\n";
}

__DATA__
Cat_id;Cat_name;Id;Name;Amount;Colour;Bla;    
101;Fruits;50020;Strawberry;500;Red;1;
101;Fruits;50020;Strawberry;1000;Red;1;
201;Vegetables;60090;Tomato;50;Green;1;
201;Vegetables;60080;Onion;1;Purple;1;
201;Vegetables;60090;Tomato;100;Red;1;
201;Vegetables;60010;Carrot;100;Purple;1;
201;Vegetables;60050;Broccoli;500;Green;1;
201;Vegetables;60050;Broccoli;1000;Green;1;
201;Vegetables;60090;Tomato;500;Yellow;1;
101;Fruits;50060;Apple;500;Green;1;
101;Fruits;50010;Grape;500;Red;1;
201;Vegetables;60010;Carrot;500;White;1;
201;Vegetables;60050;Broccoli;2000;Green;1;
201;Vegetables;60090;Tomato;1000;Red;1;
101;Fruits;50020;Strawberry;100;Red;1;
101;Fruits;50060;Apple;1000;Red;1;
201;Vegetables;60010;Carrot;250;Yellow;1;
101;Fruits;50010;Grape;100;White;1;
101;Fruits;50030;Banana;500;Yellow;1;
201;Vegetables;60010;Carrot;1000;Yellow;1;
101;Fruits;50030;Banana;1000;Green;1;
101;Fruits;50020;Strawberry;200;Red;1;
101;Fruits;50010;Grape;200;White;1;
201;Vegetables;60010;Carrot;50;Orange;1;
201;Vegetables;60080;Onion;2;White;1;

输出

Cat_id;Cat_name;Id;Name;Amount;Colour;Bla;    
101;Fruits;50010;Grape;500,100,200;Red,White;1;
101;Fruits;50020;Strawberry;500,1000,100,200;Red;1;
101;Fruits;50030;Banana;500,1000;Yellow,Green;1;
101;Fruits;50060;Apple;500,1000;Green,Red;1;
201;Vegetables;60010;Carrot;100,500,250,1000,50;Purple,White,Yellow,Orange;1;
201;Vegetables;60050;Broccoli;500,1000,2000;Green;1;
201;Vegetables;60080;Onion;1,2;Purple,White;1;
201;Vegetables;60090;Tomato;50,100,500,1000;Green,Red,Yellow;1;

这篇关于基于重复键合并CSV行,并使用Perl Text :: CSV合并唯一值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆