perl +读取多个csv文件+操纵文件+提供output_files [英] perl + read multiple csv files + manipulate files + provide output_files
问题描述
抱歉,如果这有点长,我真的很感谢这里的答案,因为我有困难让这个工作。
Apologies if this is a bit long winded, bu i really appreciate an answer here as i am having difficulty getting this to work.
从这个问题建立< a href =http://stackoverflow.com/questions/22466984/script-to-get-the-max-from-column-based-on-other-column-values/22467944?noredirect=1#comment34177528_22467944>这里,我有一个工作在csv文件(orig.csv)的脚本,并提供一个我想要的csv文件(format.csv)。我想要的是使它更通用,并接受任何数量的'* .csv'文件,并为每个输入的文件提供一个'output_ * csv'。
Building on from this question here, i have this script that works on a csv file(orig.csv) and provides a csv file that i want(format.csv). What I want is to make this more generic and accept any number of '*.csv' files and provide a 'output_*csv' for each inputed file. Can anyone help?
#!/usr/bin/perl
use strict;
use warnings;
open my $orig_fh, '<', 'orig.csv' or die $!;
open my $format_fh, '>', 'format.csv' or die $!;
print $format_fh scalar <$orig_fh>; # Copy header line
my %data;
my @labels;
while (<$orig_fh>) {
chomp;
my @fields = split /,/, $_, -1;
my ($label, $max_val) = @fields[1,12];
if ( exists $data{$label} ) {
my $prev_max_val = $data{$label}[12] || 0;
$data{$label} = \@fields if $max_val and $max_val > $prev_max_val;
}
else {
$data{$label} = \@fields;
push @labels, $label;
}
}
for my $label (@labels) {
print $format_fh join(',', @{ $data{$label} }), "\n";
}
我希望使用这里,但我有很大的困难2,在一起:
i was hoping to use this script from here but am having great difficulty putting the 2 together:
#!/usr/bin/perl
use strict;
use warnings;
#If you want to open a new output file for every input file
#Do it in your loop, not here.
#my $outfile = "KAC.pdb";
#open( my $fh, '>>', $outfile );
opendir( DIR, "/data/tmp" ) or die "$!";
my @files = readdir(DIR);
closedir DIR;
foreach my $file (@files) {
open( FH, "/data/tmp/$file" ) or die "$!";
my $outfile = "output_$file"; #Add a prefix (anything, doesn't have to say 'output')
open(my $fh, '>', $outfile);
while (<FH>) {
my ($line) = $_;
chomp($line);
if ( $line =~ m/KAC 50/ ) {
print $fh $_;
}
}
close($fh);
}
脚本读取目录中的所有文件, 'KAC 50',然后将该行追加到 output_ $ file
中 inputfile
。因此对于每个输入文件
,都会有1 output_ $ file
the script reads all the files in the directory and finds the line with this string 'KAC 50' and then appends that line to an output_$file
for that inputfile
. so there will be 1 output_$file
for every inputfile
that is read
这个脚本,我已经注意到并正在寻找修复:
- 它读取目录中的'。'和'..'文件,并产生一个
'output_。和'output_ ..'文件
- 它也会对这个脚本文件做同样的事。
issues with this script that I have noted and was looking to fix: - it reads the '.' and '..' files in the directory and produces a 'output_.' and 'output_..' file - it will also do the same with this script file.
我也试图通过获取这个脚本通过添加以下代码在任何目录中运行:
I was also trying to make it dynamic by getting this script to work in any directory it is run in by adding this code:
use Cwd qw();
my $path = Cwd::cwd();
print "$path\n";
和
opendir( DIR, $path ) or die "$!"; # open the current directory
open( FH, "$path/$file" ) or die "$!"; #open the file
* EDIT ::我已尝试组合版本, .Advise非常感谢
UserName@wabcl13 ~/Perl
$ perl formatfile_QforStackOverflow.pl
Parentheses missing around "my" list at formatfile_QforStackOverflow.pl line 13.
source dir -> /home/UserName/Perl
Can't use string ("/home/UserName/Perl/format_or"...) as a symbol ref while "strict refs" in use at formatfile_QforStackOverflow.pl line 28.
组合代码::
use strict;
use warnings;
use autodie; # this is used for the multiple files part...
#START::Getting current working directory
use Cwd qw();
my $source_dir = Cwd::cwd();
#END::Getting current working directory
print "source dir -> $source_dir\n";
my $output_prefix = 'format_';
opendir my $dh, $source_dir; #Changing this to work on current directory; changing back
for my $file (readdir($dh)) {
next if $file !~ /\.csv$/;
next if $file =~ /^\Q$output_prefix\E/;
my $orig_file = "$source_dir/$file";
my $format_file = "$source_dir/$output_prefix$file";
# .... old processing code here ...
## Start:: This part works on one file edited for this script ##
#open my $orig_fh, '<', 'orig.csv' or die $!; #line 14 and 15 above already do this!!
#open my $format_fh, '>', 'format.csv' or die $!;
#print $format_fh scalar <$orig_fh>; # Copy header line #orig needs changeing
print $format_file scalar <$orig_file>; # Copy header line
my %data;
my @labels;
#while (<$orig_fh>) { #orig needs changing
while (<$orig_file>) {
chomp;
my @fields = split /,/, $_, -1;
my ($label, $max_val) = @fields[1,12];
if ( exists $data{$label} ) {
my $prev_max_val = $data{$label}[12] || 0;
$data{$label} = \@fields if $max_val and $max_val > $prev_max_val;
}
else {
$data{$label} = \@fields;
push @labels, $label;
}
}
for my $label (@labels) {
#print $format_fh join(',', @{ $data{$label} }), "\n"; #orig needs changing
print $format_file join(',', @{ $data{$label} }), "\n";
}
## END:: This part works on one file edited for this script ##
}
推荐答案
如何计划输入要处理的文件列表及其首选输出目标?也许只有一个固定的目录,你想处理所有的cvs文件,并在结果前面。
How do you plan on inputting the list of files to process and their preferred output destination? Maybe just have a fixed directory that you want to process all the cvs files, and prefix the result.
#!/usr/bin/perl
use strict;
use warnings;
use autodie;
my $source_dir = '/some/dir/with/cvs/files';
my $output_prefix = 'format_';
opendir my $dh, $source_dir;
for my $file (readdir($dh)) {
next if $file !~ /\.csv$/;
next if $file =~ /^\Q$output_prefix\E/;
my $orig_file = "$source_dir/$file";
my $format_file = "$source_dir/$output_prefix$file";
.... old processing code here ...
}
或者,你可以只有一个输出目录,而不是前缀的文件。无论哪种方式,这应该让你在路上。
Alternatively, you could just have an output directory instead of prefixing the files. Either way, this should get you on your way.
这篇关于perl +读取多个csv文件+操纵文件+提供output_files的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!