在 perl 文件中追加一个新列 [英] Append a new column to file in perl

查看:95
本文介绍了在 perl 文件中追加一个新列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 perl 脚本中有以下函数:

I've got the follow function inside a perl script:

sub fileSize {
   my $file = shift;
   my $opt = shift;
   open (FILE, $file) or die "Could not open file $file: $!";
   $/ = ">";
   my $junk = <FILE>;
   my $g_size = 0;   
   while ( my $rec = <FILE> ) {
      chomp $rec; 
      my ($name, @seqLines) = split /\n/, $rec;
       my $sec = join('',@seqLines);
      $g_size+=length($sec);
      if ( $opt == 1 ) {
        open TMP, ">>", "tmp" or die "Could not open chr_sizes.log: $!\n";
        print TMP "$name\t", length($sec), "\n";
      }
   }
   if ( $opt == 0 ) {
      PrintLog( "file_size: $g_size", 0 );
   }
   else {
      print TMP "file_size: $g_size\n";
      close TMP;
   }
   $/ = "\n";
   close FILE;
}

输入文件格式:

>one
AAAAA
>two
BBB
>three
C

我有几个这种格式的输入文件.以>"开头的行是相同的,但其他行的长度可以不同.只有一个文件的函数的输出是:

I have several input files with that format. The line beginning with ">" is the same but the other lines can be of different length. The output of the function with only one file is:

one 5
two 3
three   1

我想为每个文件循环执行该函数:

I want to execute the function in a loop with this for each file:

foreach my $file ( @refs ) {
   fileSize( $file, 1 );
}

在运行下一次迭代时,让我们说这个文件:

When running the next iteration, let's say with this file:

>one
AAAAABB
>two
BBBVFVF
>three
CS

我想获得这个输出:

one 5 7
two 3 7
three 1 2

如何修改函数或修改脚本来获得这个?可以看出,我的函数将文本附加到文件中

How can I modify the function or modify the script to get this? As can be seen, my function append the text to the file

谢谢!

推荐答案

我已经忽略了您的选项和文件 IO 操作,而是集中展示了一种通过命令行使用数组数组来执行此操作的方法.我希望它有帮助.我将把它连接到您自己的脚本和子程序,主要由您决定:-)

I've left out your options and the file IO operations and have concentrated on showing a way to do this with an array of arrays from the command line. I hope it helps. I'll leave wiring it up to your own script and subroutines mostly up to to you :-)

针对您的第一个数据文件运行此行:

Running this one liner against your first data file:

perl -lne ' $name = s/>//r if /^>/ ; 
   push @strings , [$name, length $_] if !/^>/ ;
   END { print "@{$_ } " for @strings }' datafile1.txt

给出这个输出:

one 5 
two 3 
three 1 

替换数据文件的第二个版本或实例(ie,其中记录 one 包含 AAAAABB)也给出了预期的结果.

Substituting the second version or instance of the data file (i.e where record one contains AAAAABB) gives the expected results as well.

one 7 
two 7 
three 2

在上面的脚本中,您以这种格式保存到输出文件.因此,要将列附加到输出文件中的每一行,我们可以以相同的方式处理每个数据文件(幸运的话,这可能意味着可以将事情转换为可以在 foreach<中工作的函数)/代码>循环).如果我们将要输出的转换数据保存到数组数组(AoA),然后我们只需将我们为每个数据文件字符串获得的 lengthpush 到相应的匿名数组元素上,然后打印出该数组.瞧!现在让我们希望它有效;-)

In your script above, you save to an output file in this format. So, to append columns to each row in your output file, we can just munge each of your data files in the same way (with any luck this might mean things can be converted into a function that will work in a foreach loop). If we save the transformed data to be output into an array of arrays (AoA), then we can just push the length values we get for each data file string onto the corresponding anonymous array element and then print out the array. Voilà! Now let's hope it works ;-)

您可能想要安装 Data::Printer 可以使用从命令行作为 -MDDP 可视化数据结构.

You might want to install Data::Printer which can be used from the command line as -MDDP to visualize data structures.

  • 首先 - 运行上面的脚本并将输出重定向到一个带有 > 的文件;/tmp/output.txt
  • 接下来 - 试试这个使用 DDPp 的长单行代码来显示我们创建的数组的结构:

  • First - run the above script and redirect the output to a file with > /tmp/output.txt
  • Next - try this longish one-liner that uses DDP and p to show the structure of the array we create:

perl -MDDP -lne 'BEGIN{ local @ARGV=shift; 
 @tmp = map { [split] } <>; p @tmp } 
 $name = s/>//r if /^>/ ; 
 push @out , [ $name, length $_ ] if !/^>/ ;
 END{ p @out ; }' /tmp/output.txt datafile2.txt `

BEGIN 块中,我们 local-ize @ARGV ;shift 从第一个文件(我们的 TMP 文件版本) - {local @ARGV=shift} 几乎是处理多个的 perl 习惯用法输入文件;然后我们在匿名数组构造函数([])中split并将其map { } 放入@tmp我们用 DDPp() 函数显示的数组.一旦我们离开 BEGIN 块,我们使用 perl 的 -n 获得的隐式 while (<>){ ... }> 命令行开关接管并从 @ARGV 读取剩余的文件;我们处理以 > 开头的行 - 去除前导字符并将后面的字符串分配给 $name 变量;while 继续,我们 push $name 和任何不以 开头的行的 length> (if !/^>/) 作为匿名数组 [] 的元素包装到 @out 数组中我们也用 p() 显示它(在 END{} 块中,所以它不会在我们的隐式 while() 循环中打印).呼!!

In the BEGIN block we local-ize @ARGV ; shift off the first file (our version of your TMP file) - {local @ARGV=shift} is almost a perl idiom for handling multiple input files; we then split it inside an anonymous array constructor ([]) and map { } that into the @tmp array which we display with DDP's p() function. Once we are out of the BEGIN block, the implicit while (<>){ ... } that we get with perl's -n command line switch takes over and reads in the remaining file from @ARGV ; we process lines starting with > - stripping the leading character and assigning the string that follows to the $name variable; the while continues and we push $name and the length of any line that does not start with > (if !/^>/) wrapped as elements of an anonymous array [] into the @out array which we display with p() as well (in the END{} block so it doesn't print inside our implicit while() loop). Phew!!

将 AoA 视为 gist @Github.

  • 最后 - 以此为基础,现在我们已经很好地处理了一些事情 - 我们可以在 END{...} 块中更改一些内容(添加一个嵌套的 for 循环以push 周围的东西)并将所有这些放在一起以产生我们想要的输出.
  • Finally - building on that, and now we have munged things nicely - we can change a few things in our END{...} block (add a nested for loop to push things around) and put this all together to produce the output we want.

这一行:

perl -MDDP -lne 'BEGIN{ local @ARGV=shift; @tmp = map {[split]} <>; }
$name = s/>//r if /^>/ ; push @out, [ $name, length $_ ] if !/^>/ ;
END{ foreach $row (0..$#tmp) { push $tmp[$row] , $out[$row][-1]} ; 
   print "@$_" for @tmp }'  output.txt datafile2.txt 

产生:

one 5 7
two 3 7
three 1 2

我们必须将其转换为脚本:-)

We'll have to convert that into a script :-)

该脚本由三个相当冗长的子程序组成,用于读取日志文件;解析数据文件;合并它们.我们按顺序运行它们.第一个检查是否存在现有日志并创建一个,然后执行 exit 以跳过任何进一步的解析/合并步骤.

The script consists of three rather wordy subroutines that reads the log file; parses the datafile ; merges them. We run them in order. The first one checks to see if there is an existing log and creates one and then does an exit to skip any further parsing/merging steps.

您应该能够将它们包装在某种循环中,该循环将文件从数组提供给子例程,而不是从 STDIN 获取它们.一个警告 - 我正在使用 IO::All 因为它既有趣又简单!

You should be able to wrap them in a loop of some kind that feeds files to the subroutines from an array instead of fetching them from STDIN. One caution - I'm using IO::All because it's fun and easy!

use 5.14.0 ;          
use IO::All;    
my @file = io(shift)->slurp ;          
my  $log = "output.txt" ; 

&readlog;         
&parsedatafile;  
&mergetolog;   

####### subs ####### 
sub readlog {
   if (! -R $log) {
     print "creating first log entry\n";
     my @newlog = &parsedatafile ;  
     open(my $fh, '>', $log) or die "I CAN HAZ WHA????" ;  
     print $fh "@$_ \n" for @newlog ;
     exit;
   }
   else {
     map { [split] } io($log)->slurp ;
   }
}

sub parsedatafile {   
  my (@out, $name) ;     
  while (<@file>) {   
    chomp ;       
    $name = s/>//r if /^>/;   
    push @out, [$name, length $_] if !/^>/ ;   
  } 
  @out;       
} 

sub mergetolog {   
  my @tmp = readlog ;     
  my @data = parsedatafile ;  
  foreach my $row (0 .. $#tmp) { 
    push $tmp[$row], $data[$row][-1]  
  }        
  open(my $fh, '>', $log) or die "Foobar!!!" ; 
  print $fh "@$_ \n" for @tmp ;  
}   

子程序在这里完成所有工作——您可能会找到缩短的方法;结合;改进它们.这对您有用吗?

The subroutines do all the work here - you can likely find ways to shorten; combine; improve them. Is this a useful approach for you?

我希望这个解释清楚并且对某人有用 - 欢迎更正和评论.可能同样的事情可以用地方编辑(ieperl -pie '...')来完成,这留给后面的人作为练习......

I hope this explanation is clear and useful to someone - corrections and comments welcome. Probably the same thing could be done with place editing (i.e with perl -pie '...') which is left as an exercise to those that follow ...

这篇关于在 perl 文件中追加一个新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆