根据另一个文件中的映射合并文件 [英] Merge files based on a mapping in another file
问题描述
我在Perl中编写了一个脚本,该脚本根据第三个文件中的映射合并文件;我不使用join
的原因是行并不总是匹配.该代码有效,但是给出了似乎不会影响输出的错误:Use of uninitialized value in join or string at join.pl line 43, <$fh> line 21.
由于我是Perl的新手,所以我一直无法理解导致此错误的原因.解决此错误的任何帮助或有关我的代码的建议,将不胜感激.我在下面提供了示例输入和输出.
I have written a script in Perl that merges files based on a mapping in a third file; the reason I am not using join
is because lines won't always match. The code works, but gives an error that doesn't appear to affect output: Use of uninitialized value in join or string at join.pl line 43, <$fh> line 21.
As I am relatively new to Perl I have been unable to understand what is causing this error. Any help resolving this error or advice about my code would be greatly appreciated. I have provided example input and output below.
join.pl
#!/usr/bin/perl
use strict;
use warnings;
use diagnostics;
use Tie::File;
use Scalar::Util qw(looks_like_number);
chomp( my $infile = $ARGV[0] );
chomp( my $infile1 = $ARGV[1] );
chomp( my $infile2 = $ARGV[2] );
chomp( my $outfile = $ARGV[3] );
open my $mapfile, '<', $infile or die "Could not open $infile: $!";
open my $file1, '<', $infile1 or die "Could not open $infile1: $!";
open my $file2, '<', $infile2 or die "Could not open $infile2: $!";
tie my @tieFile1, 'Tie::File', $infile1 or die "Could not open $infile1: $!";
tie my @tieFile2, 'Tie::File', $infile2 or die "Could not open $infile2: $!";
open my $output, '>', $outfile or die "Could not open $outfile: $!";
my %map1;
my %map2;
# This loop will read two input files and populate two hashes
# using the coordinates (field 2) and the current line number
while ( my $line1 = <$file1>, my $line2 = <$file2> ) {
my @row1 = split( "\t", $line1 );
my @row2 = split( "\t", $line2 );
# $. holds the line number
$map1{$row1[1]} = $.;
$map2{$row2[1]} = $.;
}
close($file1);
close($file2);
while ( my $line = <$mapfile> ) {
chomp $line;
my @row = split( "\t", $line );
my $species1 = $row[1];
my $reference1 = $map1{$species1};
my $species2 = $row[3];
my $reference2 = $map2{$species2};
my @nomatch = ("NA", "", "NA", "", "", "", "", "NA", "NA");
# test numeric
if ( looks_like_number($reference1) && looks_like_number($reference2) ) {
# do the do using the maps
print $output join("\t", $tieFile1[$reference1], $tieFile2[$reference2]), "\n";
}
elsif ( looks_like_number($reference1) )
{
print $output join("\t", $tieFile1[$reference1], @nomatch), "\n";
}
elsif ( looks_like_number($reference2) )
{
print $output join("\t", @nomatch, $tieFile2[$reference2]), "\n";
}
}
close($output);
untie @tieFile1;
untie @tieFile2;
input_1:
Scf_3L 12798910 T 0 41 0 0 NA NA
Scf_3L 12798911 C 0 0 43 0 NA NA
Scf_3L 12798912 A 42 0 0 0 NA NA
Scf_3L 12798913 G 0 0 0 44 NA NA
Scf_3L 12798914 T 0 42 0 0 NA NA
Scf_3L 12798915 G 0 0 0 44 NA NA
Scf_3L 12798916 T 0 42 0 0 NA NA
Scf_3L 12798917 A 41 0 0 0 NA NA
Scf_3L 12798918 G 0 0 0 43 NA NA
Scf_3L 12798919 T 0 43 0 0 NA NA
Scf_3L 12798920 T 0 41 0 0 NA NA
input_2:
3L 12559896 T 0 31 0 0 NA NA
3L 12559897 C 0 0 33 0 NA NA
3L 12559898 A 34 0 0 0 NA NA
3L 12559899 G 0 0 0 33 NA NA
3L 12559900 T 0 34 0 0 NA NA
3L 12559901 G 0 0 0 33 NA NA
3L 12559902 T 0 33 0 0 NA NA
3L 12559903 A 33 0 0 0 NA NA
3L 12559904 G 0 0 0 33 NA NA
3L 12559905 T 0 34 0 0 NA NA
3L 12559906 T 0 33 0 0 NA NA
地图:
3L 12798910 T 12559896 T
3L 12798911 C 12559897 C
3L 12798912 A 12559898 A
3L 12798913 G 12559899 G
3L 12798914 T 12559900 T
3L 12798915 G 12559901 G
3L 12798916 T 12559902 T
3L 12798917 A 12559903 A
3L 12798918 G 12559904 G
3L 12798919 T 12559905 T
3L 12798920 T 12559906 T
输出:
Scf_3L 12798910 T 0 41 0 0 NA NA 3L 12559896 T 0 31 0 0 NA NA
Scf_3L 12798911 C 0 0 43 0 NA NA 3L 12559897 C 0 0 33 0 NA NA
Scf_3L 12798912 A 42 0 0 0 NA NA 3L 12559898 A 34 0 0 0 NA NA
Scf_3L 12798913 G 0 0 0 44 NA NA 3L 12559899 G 0 0 0 33 NA NA
Scf_3L 12798914 T 0 42 0 0 NA NA 3L 12559900 T 0 34 0 0 NA NA
Scf_3L 12798915 G 0 0 0 44 NA NA 3L 12559901 G 0 0 0 33 NA NA
Scf_3L 12798916 T 0 42 0 0 NA NA 3L 12559902 T 0 33 0 0 NA NA
Scf_3L 12798917 A 41 0 0 0 NA NA 3L 12559903 A 33 0 0 0 NA NA
Scf_3L 12798918 G 0 0 0 43 NA NA 3L 12559904 G 0 0 0 33 NA NA
Scf_3L 12798919 T 0 43 0 0 NA NA 3L 12559905 T 0 34 0 0 NA NA
Scf_3L 12798920 T 0 41 0 0 NA NA 3L 12559906 T 0 33 0 0 NA NA
推荐答案
直接的问题是,绑定数组的索引从零开始,而$.
中的行号从1开始.这意味着在使用前,您需要从$.
或$reference
变量中减去一个.它们作为索引.因此,您得到的数据最初是永远不会正确的,并且如果不是警告的话,您可能会忽略了它!
The immediate problem is that the indices of the tied arrays start at zero, while the line numbers in $.
start at 1. That means you need to subtract one from $.
or from the $reference
variables before using them as indices. So your resulting data was never correct in the first place, and you may have overlooked that if it weren't for the warning!
我修复了该问题,并还整理了一些代码.我主要添加了use autodie
,因此无需检查IO操作的状态(Tie::File
除外),更改为列表分配,将代码移动到将文件读取到子例程中,并添加了代码块,以便词法分析文件句柄将自动关闭
I fixed that and also tidied up your code a little. I mostly added use autodie
so that there's no need to check the status of IO operations (except for Tie::File
), changed to list assignments, moved the code to read the files into a subroutine, and added code blocks so that the lexical file handles would be closed automatically
我还使用绑定数组来构建%map
哈希,而不是单独打开文件,这意味着它们的值已经是基于零的了,因为它们必须是
I also used the tied arrays to build the %map
hashes instead of opening the files separately, which means their values are already zero-based as they need to be
哦,我删除了looks_like_number
,因为$reference
变量必须是数字或undef
,因为这就是我们放入哈希表中的全部内容.检查值是否不是undef
的正确方法是使用defined
运算符
Oh, and I removed looks_like_number
, because the $reference
variables must be either numeric or undef
because that's all we put into the hash. The correct way to check that a value isn't undef
is with the defined
operator
#!/usr/bin/perl
use strict;
use warnings 'all';
use autodie;
use Fcntl 'O_RDONLY';
use Tie::File;
my ( $mapfile, $infile1, $infile2, $outfile ) = @ARGV;
{
tie my @file1, 'Tie::File' => $infile1, mode => O_RDONLY
or die "Could not open $infile1: $!";
tie my @file2, 'Tie::File' =>$infile2, mode => O_RDONLY
or die "Could not open $infile2: $!";
my %map1 = map { (split /\t/, $file1[$_], 3)[1] => $_ } 0 .. $#file1;
my %map2 = map { (split /\t/, $file2[$_], 3)[1] => $_ } 0 .. $#file2;
open my $map_fh, '<', $mapfile;
open my $out_fh, '>', $outfile;
while ( <$map_fh> ) {
chomp;
my @row = split /\t/;
my ( $species1, $species2 ) = @row[1,3];
my $reference1 = $map1{$species1};
my $reference2 = $map2{$species2};
my @nomatch = ( "NA", "", "NA", "", "", "", "", "NA", "NA" );
my @fields = (
( defined $reference1 ? $file1[$reference1] : @nomatch),
( defined $reference2 ? $file2[$reference2] : @nomatch),
);
print $out_fh join( "\t", @fields ), "\n";
}
}
输出
Scf_3L 12798910 T 0 41 0 0 NA NA NA NA NA NA
Scf_3L 12798911 C 0 0 43 0 NA NA NA NA NA NA
Scf_3L 12798912 A 42 0 0 0 NA NA NA NA NA NA
Scf_3L 12798913 G 0 0 0 44 NA NA NA NA NA NA
Scf_3L 12798914 T 0 42 0 0 NA NA NA NA NA NA
Scf_3L 12798915 G 0 0 0 44 NA NA NA NA NA NA
Scf_3L 12798916 T 0 42 0 0 NA NA NA NA NA NA
Scf_3L 12798917 A 41 0 0 0 NA NA NA NA NA NA
Scf_3L 12798918 G 0 0 0 43 NA NA NA NA NA NA
Scf_3L 12798919 T 0 43 0 0 NA NA NA NA NA NA
Scf_3L 12798920 T 0 41 0 0 NA NA NA NA NA NA
这篇关于根据另一个文件中的映射合并文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!