如何将 shell 脚本转换为 Perl? [英] How can I translate a shell script to Perl?

查看:46
本文介绍了如何将 shell 脚本转换为 Perl?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 shell 脚本,非常大.现在我的老板说我必须用 Perl 重写它.有什么方法可以编写 Perl 脚本并像在我的 Perl 脚本中一样使用现有的 shell 代码.类似于 Inline::C.

有像 Inline::Shell 这样的东西吗?我看过内联模块,但它只支持语言.

解决方案

我会认真回答.我不知道有什么程序可以将 shell 脚本翻译成 Perl,我怀疑任何解释器模块都会提供性能优势.因此,我将概述我将如何进行.

现在,您希望尽可能多地重用您的代码.在这种情况下,我建议选择该代码的一部分,编写该代码的 Perl 版本,然后从主脚本调用 Perl 脚本.这将使您能够以小步骤进行转换,断言转换后的部分正在工作,并逐渐提高您的 Perl 知识.

因为你可以从 Perl 脚本调用外部程序,你甚至可以用 Perl 替换一些更大的逻辑,并从 Perl 调用更小的 shell 脚本(或其他命令)来做一些你还觉得转换不舒服的事情.因此,您将有一个 shell 脚本调用一个 perl 脚本调用另一个 shell 脚本.而且,事实上,我用自己的第一个 Perl 脚本就是这样做的.

当然,选择好要转换的内容很重要.下面我将解释有多少 shell 脚本中常见的模式是用 Perl 编写的,以便您可以在脚本中识别它们,并通过尽可能多的剪切和粘贴来创建替换.

首先,Perl 脚本和 Shell 脚本都是代码+函数.即,任何不是函数声明的东西都按照它遇到的顺序执行.但是,您不需要在使用前声明函数.这意味着可以保留脚本的总体布局,尽管将内容保存在内存中的能力(例如整个文件或它的处理形式)可以简化任务.

Unix 中的 Perl 脚本以如下内容开头:

#!/usr/bin/perl使用严格;使用警告;使用数据::倾销者;#其他库(其余代码)

显然,第一行指向用于运行脚本的命令,就像普通 shell 一样.以下两个使用"行使语言更加严格,这应该会减少您遇到的错误数量,因为您不太了解该语言(或者普通人做错了什么).第三个 use 行导入Data"模块的Dumper"功能.它可用于调试目的.如果您想知道数组或哈希表的值,只需打印 Dumper(whatever).

还要注意,注释就像 shell 的一样,行以#"开头.

现在,您可以调用外部程序并通过管道连接到它们或从它们通过管道.例如:

打开这个,"cat $ARGV[0] |";

这将运行 cat,传递$ARGV[0]",这在 shell 上是 $1 —— 传递给它的第一个参数.其结果将通过THIS"通过管道传送到您的 Perl 脚本中,您可以使用它从中读取该结果,稍后我将展示.

您可以使用|"在行首或行尾,表示模式pipe to"或pipe from",并指定要运行的命令,也可以在开头使用>"或>>",打开一个带有或不带有截断的写入文件,<"明确指示打开文件进行读取(默认),或+<"和+>"用于读取和写入.请注意,后者将首先截断文件.

open"的另一种语法可以避免名称中包含此类字符的文件出现问题,将打开模式作为第二个参数:

打开这个, "-|", "cat $ARGV[0]";

这将做同样的事情.模式-|"代表pipe from",|-"代表pipe to".其余的模式可以照原样使用(>、>>、<、+>、+<).虽然要打开的不止这些,但它应该足以满足大多数事情的需要.

但是你应该尽量避免调用外部程序.您可以直接打开文件,例如通过执行 open THIS, "$ARGV[0]";,并获得更好的性能.

那么,您可以删除哪些外部程序?嗯,几乎所有的东西.但是让我们保持基础知识:cat、grep、cut、head、tail、uniq、wc、sort.

CAT

嗯,关于这个没什么好说的.请记住,如果可能,只读取文件一次并将其保存在内存中.如果文件很大,您当然不会这样做,但几乎总有办法避免多次读取文件.

无论如何,cat的基本语法是:

my $filename = "whatever";open FILE, "$filename" or die "Could not open $filename!\n";而(<文件>){打印 $_;}关闭文件;

这会打开一个文件,并打印它的所有内容(while()"将循环直到 EOF,将每一行分配给$_"),然后再次关闭它.

如果我想将输出定向到另一个文件,我可以这样做:

my $filename = "whatever";我的 $anotherfile = "另一个";打开(文件,$文件名")||die "无法打开 $filename!\n";open OUT, ">", "$anotherfile" or die "无法打开 $anotherfile 进行写入!\n";而(<文件>){打印出 $_;}关闭文件;

这会将行打印到由OUT"指示的文件中.您也可以在适当的位置使用 STDINSTDOUTSTDERR,而无需先打开它们.实际上,print"默认为STDOUTdie"默认为STDERR".>

还要注意or die ..."和|| die ...".运算符 or|| 表示如果第一个返回 false(这意味着空字符串、空引用、0 等),它将只执行以下命令.die 命令会停止脚本并显示错误消息.

or"和||"之间的主要区别是优先级.如果在上面的示例中or"被替换为||",它将不会按预期工作,因为该行将被解释为:

open FILE, ("$filename" || die "Could not open $filename!\n");

这完全不是预期的.由于or"的优先级较低,所以它有效.在使用 "||" 的行中,open 的参数在括号之间传递,从而可以使用 "||".

唉,有些东西几乎和猫一样:

while(<>) {打印 $_;}

这将打印命令行中的所有文件,或任何通过 STDIN 传递的内容.

GREP

那么,我们的grep"脚本将如何工作?我将假设grep -E",因为这在 Perl 中比简单的 grep 更容易.总之:

我的 $pattern = $ARGV[0];转移@ARGV;而(<>){打印 $_ 如果/$pattern/o;}

传递给 $patttern 的 "o" 指示 Perl 只编译该模式一次,从而提高速度.不是如果条件允许的东西"的风格.这意味着它只会在条件为真时执行某事".最后,单独的/$pattern/"与$_ =~ m/$pattern/"相同,表示将$_与正则表达式进行比较表明的.如果你想要标准的 grep 行为,即只是子字符串匹配,你可以写:

print $_ if $_ =~ "$pattern";

剪切

通常,使用正则表达式组来获得准确的字符串比使用 cut 更好.例如,你会用sed"做什么.无论如何,这里有两种复制剪切的方法:

while(<>) {我的@array = 拆分,";打印 $array[3], "\n";}

这将为您提供每行的第四列,使用,"作为分隔符.注意 @array$array[3].@ 符号表示数组"应该被视为一个数组.它将接收由当前处理行中的每一列组成的数组.接下来,$ 符号表示 array[3] 是一个标量值.它将返回您要求的列.

不过,这不是一个好的实现,因为split"将扫描整个字符串.我曾经通过不使用 split 将一个过程从 30 分钟减少到 2 秒——不过,这些行相当大.无论如何,如果预计行数较大,而您想要的列数较低,则以下具有优越的性能:

while(<>) {我的 ($column) =/^(?:[^,]*,){3}([^,]*),/;打印 $column, "\n";}

这利用正则表达式来获取所需的信息,仅此而已.

如果你想要位置列,你可以使用:

while(<>) {打印 substr($_, 5, 10), "\n";}

将从第六个开始打印 10 个字符(同样,0 表示第一个字符).

头部

这个很简单:

my $printlines = abs(shift);我的 $lines = 0;我的 $current;而(<>){如果($ARGV ne $current){$行= 0;$current = $ARGV;}如果 $lines <,则打印$_"$printlines;$行++;}

这里需要注意的事情.我使用ne"来比较字符串.现在,$ARGV 将始终指向正在读取的当前文件,因此我会跟踪它们以在读取新文件时重新开始计数.还要注意if"的更传统语法,以及后固定语法.

我还使用简化的语法来获取要打印的行数.当您单独使用shift"时,它将假定shift @ARGV".另外请注意,shift 除了修改@ARGV 之外,还会返回移出它的元素.

与外壳一样,数字和字符串之间没有区别——您只需使用它即可.甚至像2"+2"这样的东西也能用.事实上,Perl 更宽容,乐于将任何非数字视为 0,因此您可能需要小心.

这个脚本非常低效,因为它读取所有文件,而不仅仅是所需的行.让我们改进它,看看过程中的几个重要关键字:

my $printlines = abs(shift);我的文件;如果(标量(@ARGV)== 0){@files = ("-");} 别的 {@files = @ARGV;}对于我的 $file (@files) {下一个除非 -f $file &&-r $文件;打开 FILE, "<", $file 或 next;我的 $lines = 0;而(<文件>){最后如果 $lines == $printlines;打印$_";$行++;}关闭文件;}

关键字next"和last"非常有用.首先,next"将告诉 Perl 返回到循环条件,如果适用,获取下一个元素.这里我们使用它来跳过文件,除非它确实是一个文件(不是目录)并且可读.如果我们仍然无法打开文件,它也会跳过.

然后last"用于立即跳出循环.一旦达到所需的行数,我们就使用它来停止读取文件.我们确实读了很多行,但是在那个位置有last"清楚地表明它后面的行不会被执行.

还有重做",它会回到循环的开头,但不会重新评估条件,也不会获取下一个元素.

尾巴

我会在这里做一个小技巧.

my $skiplines = abs(shift);我的@lines;我的 $current = "";而(<>){如果($ARGV ne $current){打印@lines;undef @lines;$current = $ARGV;}推@lines, $_;shift @lines if $#lines == $skiplines;}打印@lines;

好的,我将push"(将值附加到数组)与shift"(从数组的开头获取某些内容)结合使用.如果你想要一个堆栈,你可以使用 push/pop 或 shift/unshift.混合它们,你有一个队列.我用 $#lines 保持我的队列最多有 10 个元素,这会给我数组中最后一个元素的索引.您还可以使用 scalar(@lines) 获取 @lines 中的元素数量.

UNIQ

现在,uniq 只消除重复的连续行,这对于您目前所见应该很容易.所以我将消除所有这些:

my $current = "";我的 %lines;而(<>){如果($ARGV ne $current){undef %lines;$current = $ARGV;}打印 $_ 除非定义($lines{$_});$lines{$_} = "";}

现在我将整个文件保存在内存中,位于 %lines 内.% 符号的使用表明这是一个哈希表.我使用这些行作为键,不存储任何值——因为我对这些值不感兴趣.我用defined($lines{$_})"检查键的存在位置,这将测试与该键关联的值是否已定义;关键字unless"的作用与if"类​​似,但效果相反,因此它仅在未定义该行的情况下打印一行.

还要注意,语法 $lines{$_} = "" 作为在哈希表中存储内容的一种方式.请注意 {} 用于哈希表,而不是 [] 用于数组.

厕所

这实际上会用到很多我们见过的东西:

我的 $current;我的 %lines;我的话;我的 %chars;而(<>){$lines{"$ARGV"}++;$chars{"$ARGV"} += length($_);$words{"$ARGV"} += 标量(grep {$_ ne ""} split/\s/);}对于我的 $file (keys %lines) {打印 "$lines{$file} $words{$file} $chars{$file} $file\n";}

三个新事物.两个是+="运算符(应该很明显)和for"表达式.基本上,for"会将数组的每个元素分配给指定的变量.my"用于声明变量,但如果之前声明则不需要它.我可以在这些括号内有一个 @array 变量.keys %lines"表达式将作为一个数组返回,它们的键(文件名)存在于哈希表%lines"中.其余的应该很明显.

第三件事,我实际上只是为了修改答案而添加的,是grep".这里的格式是:

grep { 代码} 数组

它将为数组的每个元素运行代码",将元素作为$_"传递.然后 grep 将返回代码评估为true"(不是 0,不是"等)的所有元素.这避免了计算由连续空格产生的空字符串.

与grep"类似的还有map",我不会在这里演示.它将返回由每个元素的代码"结果组成的数组,而不是过滤.

排序

最后,排序.这个也很简单:

我的@lines;我的 $current = "";而(<>){如果($ARGV ne $current){打印排序@lines;undef @lines;$current = $ARGV;}推@lines, $_;}打印排序@lines;

这里,sort"将对数组进行排序.请注意, sort 可以接收一个函数来定义排序标准.例如,如果我想对数字进行排序,我可以这样做:

我的@lines;我的 $current = "";而(<>){如果($ARGV ne $current){打印排序@lines;undef @lines;$current = $ARGV;}推@lines, $_;}打印排序{$a <=>$b} @lines;

这里$a"和$b"接收要比较的元素."<=>" 返回 -1、0 或 1,具体取决于数字是小于、等于还是大于另一个.对于字符串,cmp"做同样的事情.

处理文件、目录和其他东西

至于其他,基本的数学表达式应该很容易理解.您可以通过这种方式测试有关文件的某些条件:

对于我的 $file (@ARGV) {打印 "$file 是一个文件\n" if -f "$file";打印 "$file 是一个目录\n" if -d "$file";print "我可以读取 $file\n" if -r "$file";打印 "我可以写入 $file\n" if -w "$file";}

我不想在此赘述,还有许多其他此类测试.我也可以做glob"模式,比如 shell 的*"和?",像这样:

对于我的 $file (glob("*")) {打印 $file;打印 "*" if -x "$file" &&!-d "$文件";打印 "/" if -d "$file";打印\t";}

如果你把它和chdir"结合起来,你也可以模拟find":

sub list_dir($$) {我的 ($dir, $prefix) = @_;我的 $newprefix = $prefix;if ($prefix eq "") {$newprefix = $dir;} 别的 {$newprefix .= "/$dir";}chdir $dir;对于我的 $file (glob("*")) {打印 "$prefix/" 如果 $prefix ne "";打印 "$dir/$file\n";list_dir($file, $newprefix) if -d "$file";}chdir "..";}list_dir(".", "");

在这里,我们终于看到了一个函数.使用以下语法声明函数:

子名称(参数){ 代码}

严格来说,(params)"是可选的.我使用的声明参数($$)"表示我正在接收两个标量参数.我也可以在其中包含@"或%".数组@_"已传递所有参数.my ($dir, $prefix) = @_"这一行只是将数组的前两个元素分配给变量 $dir 和 <代码>$prefix.

这个函数不返回任何东西(它是一个过程,真的),但是你可以通过添加return something;"来获得返回值的函数,并让它返回something".

剩下的应该很明显了.

混合一切

现在我将展示一个更复杂的例子.我将展示一些糟糕的代码来解释它的问题,然后展示更好的代码.

对于第一个示例,我有两个文件,names.txt 文件,其中包含姓名和电话号码,systems.txt,其中包含系统和负责它们的名称.他们在这里:

names.txt

John Doe,(555) 1234-4321简·多伊 (555) 5555-5555老板,(666) 5555-5555

systems.txt

销售,Jane Doe库存,约翰·多伊付款,那个家伙

然后,如果该人负责该系统,我想打印第一个文件,将系统附加到该人的姓名后.第一个版本可能如下所示:

#!/usr/bin/perl使用严格;使用警告;打开文件,names.txt";而(<文件>){我的 ($name) =/^([^,]*),/;我的 $system = get_system($name);打印 $_ .", $system\n";}关闭文件;子 get_system($) {我的 ($name) = @_;我的 $system = "";打开文件,systems.txt";而(<文件>){下一个除非/$name/o;($system) =/([^,]*)/;}关闭文件;返回 $system;}

但是,此代码不起作用.Perl 会抱怨该函数使用得太早,无法检查原型,但这只是一个警告.它会在第 8 行(第一个 while 循环)上给出错误,抱怨关闭的文件句柄上的 readline.这里发生的事情是 "FILE" 是全局的,所以函数 get_system 正在改变它.让我们重写它,修复这两件事:

#!/usr/bin/perl使用严格;使用警告;子 get_system($) {我的 ($name) = @_;我的 $system = "";打开我的 $filehandle, "systems.txt";而(<$文件句柄>){下一个除非/$name/o;($system) =/([^,]*)/;}关闭 $filehandle;返回 $system;}打开文件,names.txt";而(<文件>){我的 ($name) =/^([^,]*),/;我的 $system = get_system($name);打印 $_ .", $system\n";}关闭文件;

这不会给出任何错误或警告,也不会起作用.它只返回系统,而不是姓名和电话号码!发生了什么?好吧,发生的事情是我们在调用 get_system 之后引用了$_",但是,通过读取文件,get_system 是覆盖$_的值!

为了避免这种情况,我们将 $_ 放在 get_system 内部.这将给它一个本地范围,然后一旦从 get_system 返回原始值将被恢复:

#!/usr/bin/perl使用严格;使用警告;子 get_system($) {我的 ($name) = @_;我的 $system = "";本地$_;打开我的 $filehandle, "systems.txt";而(<$文件句柄>){下一个除非/$name/o;($system) =/([^,]*)/;}关闭 $filehandle;返回 $system;}打开文件,names.txt";而(<文件>){我的 ($name) =/^([^,]*),/;我的 $system = get_system($name);打印 $_ .", $system\n";}关闭文件;

那还是不行!它在名称和系统之间打印一个换行符.好吧,Perl 读取包括它可能有的任何换行符的行.有一个简洁的命令可以从字符串中删除换行符,chomp",我们将使用它来解决这个问题.由于不是每个名字都有一个系统,我们也可以避免在发生这种情况时打印逗号:

#!/usr/bin/perl使用严格;使用警告;子 get_system($) {我的 ($name) = @_;我的 $system = "";本地$_;打开我的 $filehandle, "systems.txt";而(<$文件句柄>){下一个除非/$name/o;($system) =/([^,]*)/;}关闭 $filehandle;返回 $system;}打开文件,names.txt";而(<文件>){我的 ($name) =/^([^,]*),/;我的 $system = get_system($name);咀嚼;打印 $_;打印 ", $system" 如果 $system ne "";打印\n";}关闭文件;

这行得通,但也碰巧效率极低.我们读取名称文件中每一行的整个系统文件.为了避免这种情况,我们将从系统中读取所有数据一次,然后使用它来处理名称.

现在,有时文件太大了,您无法将其读入内存.发生这种情况时,您应该尝试将处理它所需的任何other 文件读入内存,以便您可以一次性完成每个文件的所有操作.无论如何,这是它的第一个优化版本:

#!/usr/bin/perl使用严格;使用警告;我们的 %systems;打开系统,systems.txt";而(<系统>){我的 ($system, $name) =/([^,]*),(.*)/;$systems{$name} = $system;}关闭系统;打开 NAMES, "names.txt";而(){我的 ($name) =/^([^,]*),/;咀嚼;打印 $_;打印 ", $systems{$name}" 如果定义了 $systems{$name};打印\n";}关闭名称;

不幸的是,它不起作用.没有系统出现过!发生了什么?好吧,让我们通过使用 Data::Dumper 来看看 "%systems" 包含什么:

#!/usr/bin/perl使用严格;使用警告;使用数据::倾销者;我们的 %systems;打开系统,systems.txt";而(<系统>){我的 ($system, $name) =/([^,]*),(.*)/;$systems{$name} = $system;}关闭系统;打印转储器(%系统);打开 NAMES, "names.txt";而(){我的 ($name) =/^([^,]*),/;咀嚼;打印 $_;打印 ", $systems{$name}" 如果定义了 $systems{$name};打印\n";}关闭名称;

输出将是这样的:

$VAR1 = 'Jane Doe';$VAR2 = '销售';$VAR3 = '那个家伙';$VAR4 = '付款';$VAR5 = '约翰·多伊';$VAR6 = '库存';约翰·多伊 (555) 1234-4321简·多伊 (555) 5555-5555老板,(666) 5555-5555

那些$VAR1/$VAR2/etc 就是Dumper 显示哈希表的方式.奇数是键,后面的偶数是值.现在我们可以看到 %systems 中的每个名称前面都有一个空格!愚蠢的正则表达式错误,让我们修复它:

#!/usr/bin/perl使用严格;使用警告;我们的 %systems;打开系统,systems.txt";而(<系统>){我的 ($system, $name) =/^\s*([^,]*?)\s*,\s*(.*?)\s*$/;$systems{$name} = $system;}关闭系统;打开 NAMES, "names.txt";而(){我的 ($name) =/^\s*([^,]*?)\s*,/;咀嚼;打印 $_;打印 ", $systems{$name}" 如果定义了 $systems{$name};打印\n";}关闭名称;

因此,在这里,我们正在积极删除名称和系统开头或结尾的任何空格.还有其他方法可以形成该正则表达式,但这不是重点.这个脚本还有一个问题,如果你的names.txt"和/或systems.txt"文件末尾有一个空行,你就会看到这个问题.警告如下所示:

在 ./exemplo3e.pl 第 10 行的散列元素中使用未初始化的值,第 4 行.在 ./exemplo3e.pl 第 10 行,<SYSTEMS> 处的散列元素中使用未初始化的值.第 4 行.John Doe,(555) 1234-4321,库存Jane Doe,(555) 5555-5555,销售老板,(666) 5555-5555在 ./exemplo3e.pl 第 19 行, 处的散列元素中使用未初始化的值.第 4 行.

这里发生的事情是在处理空行时没有任何内容进入$name"变量.有很多方法可以解决这个问题,但我选择以下方法:

#!/usr/bin/perl使用严格;使用警告;我们的 %systems;打开SYSTEMS,systems.txt"或死无法打开systems.txt!";而(<系统>){我的 ($system, $name) =/^\s*([^,]+?)\s*,\s*(.+?)\s*$/;$systems{$name} = $system 如果定义了 $name;}关闭系统;打开NAMES,names.txt"或死无法打开names.txt!";而(){我的 ($name) =/^\s*([^,]+?)\s*,/;咀嚼;打印 $_;打印 ", $systems{$name}" 如果已定义($name) &&定义($systems{$name});打印\n";}关闭名称;

正则表达式现在要求名称和系统至少有一个字符,我们在使用它之前测试是否定义了$name".

结论

那么,这些就是翻译 shell 脚本的基本工具.你可以用 Perl 做更多的事情,但这不是你的问题,而且它也不适合这里.

就像一些重要主题的基本概述一样,

  • 一个可能被黑客攻击的 Perl 脚本需要使用 -T 选项运行,这样 Perl 就会抱怨任何没有正确处理的易受攻击的输入.

  • 有一些库,称为模块,用于数据库访问、XML&cia 处理、Telnet、HTTP &其他协议.事实上,在 CPAN 上可以找到许多模块.

  • 正如其他人所说,如果您使用 AWK 或 SED,您可以使用 A2PS2P.

  • Perl 可以以面向对象的方式编写.

  • Perl 有多个版本.在撰写本文时,稳定版本是 5.8.8,还有一个 5.10.0 可用.还有一个 Perl 6 正在开发中,但经验告诉大家不要急于等待.

有一个免费的、好的、动手的、努力的&一本关于 Perl 的快速书籍,名为 Learning Perl The Hard Way.它的风格类似于这个答案.从这里开始可能是一个不错的去处.

我希望这会有所帮助.

免责声明

我不是要教 Perl,您至少需要一些参考资料.良好的 Perl 习惯有一些指导方针,例如在脚本开头使用use strict;"和use warnings;",以使其不那么宽松编写代码,或在打印行上使用 STDOUT 和 STDERR,以指示正确的输出管道.

这是我同意的东西,但我认为它会偏离显示常见 shell 脚本实用程序模式的基本目标.

I have a shell script, pretty big one. Now my boss says I must rewrite it in Perl. Is there any way to write a Perl script and use the existing shell code as is in my Perl script. Something similar to Inline::C.

Is there something like Inline::Shell? I had a look at inline module, but it supports only languages.

解决方案

I'll answer seriously. I do not know of any program to translate a shell script into Perl, and I doubt any interpreter module would provide the performance benefits. So I'll give an outline of how I would go about it.

Now, you want to reuse your code as much as possible. In that case, I suggest selecting pieces of that code, write a Perl version of that, and then call the Perl script from the main script. That will enable you to do the conversion in small steps, assert that the converted part is working, and improve gradually your Perl knowledge.

As you can call outside programs from a Perl script, you can even replace some bigger logic with Perl, and call smaller shell scripts (or other commands) from Perl to do something you don't feel comfortable yet to convert. So you'll have a shell script calling a perl script calling another shell script. And, in fact, I did exactly that with my own very first Perl script.

Of course, it's important to select well what to convert. I'll explain, below, how many patterns common in shell scripts are written in Perl, so that you can identify them inside your script, and create replacements by as much cut&paste as possible.

First, both Perl scripts and Shell scripts are code+functions. Ie, anything which is not a function declaration is executed in the order it is encountered. You don't need to declare functions before use, though. That means the general layout of the script can be preserved, though the ability to keep things in memory (like a whole file, or a processed form of it) makes it possible to simplify tasks.

A Perl script, in Unix, starts with something like this:

#!/usr/bin/perl

use strict;
use warnings;

use Data::Dumper;
#other libraries

(rest of the code)

The first line, obviously, points to the commands to be used to run the script, just like normal shells do. The following two "use" lines make then language more strict, which should decrease the amount of bugs you encounter because you don't know the language well (or plain did something wrong). The third use line imports the "Dumper" function of the "Data" module. It's useful for debugging purposes. If you want to know the value of an array or hash table, just print Dumper(whatever).

Note also that comments are just like shell's, lines starting with "#".

Now, you call external programs and pipe to or pipe from them. For example:

open THIS, "cat $ARGV[0] |";

That will run cat, passing "$ARGV[0]", which would be $1 on shell -- the first argument passed to it. The result of that will be piped into your Perl script through "THIS", which you can use to read that from it, as I'll show later.

You can use "|" at the beginning or end of line, to indicate the mode "pipe to" or "pipe from", and specify a command to be run, and you can also use ">" or ">>" at the beginning, to open a file for writing with or without truncation, "<" to explicitly indicate opening a file for reading (the default), or "+<" and "+>" for read and write. Notice that the later will truncate the file first.

Another syntax for "open", which will avoid problems with files with such characters in their names, is having the opening mode as a second argument:

open THIS, "-|", "cat $ARGV[0]";

This will do the same thing. The mode "-|" stands for "pipe from" and "|-" stands for "pipe to". The rest of the modes can be used as they were (>, >>, <, +>, +<). While there is more than this to open, it should suffice for most things.

But you should avoid calling external programs as much as possible. You could open the file directly, by doing open THIS, "$ARGV[0]";, for example, and have much better performance.

So, what external programs you could cut out? Well, almost everything. But let's stay with the basics: cat, grep, cut, head, tail, uniq, wc, sort.

CAT

Well, there isn't much to be said about this one. Just remember that, if possible, read the file only once and keep it in memory. If the file is huge you won't do that, of course, but there are almost always ways to avoid reading a file more than once.

Anyway, the basic syntax for cat would be:

my $filename = "whatever";
open FILE, "$filename" or die "Could not open $filename!\n";
while(<FILE>) {
  print $_;
}
close FILE;

This opens a file, and prints all it's contents ("while(<FILE>)" will loop until EOF, assigning each line to "$_"), and close it again.

If I wanted to direct the output to another file, I could do this:

my $filename = "whatever";
my $anotherfile = "another";
open (FILE, "$filename") || die "Could not open $filename!\n";
open OUT, ">", "$anotherfile" or die "Could not open $anotherfile for writing!\n";
while(<FILE>) {
  print OUT $_;
}
close FILE;

This will print the line to the file indicated by "OUT". You can use STDIN, STDOUT and STDERR in the appropriate places as well, without having to open them first. In fact, "print" defaults to STDOUT, and "die" defaults to "STDERR".

Notice also the "or die ..." and "|| die ...". The operators or and || means it will only execute the following command if the first returns false (which means empty string, null reference, 0, and the like). The die command stops the script with an error message.

The main difference between "or" and "||" is priority. If "or" was replaced by "||" in the examples above, it would not work as expected, because the line would be interpreted as:

open FILE, ("$filename" || die "Could not open $filename!\n");

Which is not at all what is expected. As "or" has a lower priority, it works. In the line where "||" is used, the parameters to open are passed between parenthesis, making it possible to use "||".

Alas, there is something which is pretty much what cat does:

while(<>) {
  print $_;
}

That will print all files in the command line, or anything passed through STDIN.

GREP

So, how would our "grep" script work? I'll assume "grep -E", because that's easier in Perl than simple grep. Anyway:

my $pattern = $ARGV[0];
shift @ARGV;
while(<>) {
        print $_ if /$pattern/o;
}

The "o" passed to $patttern instructs Perl to compile that pattern only once, thus gaining you speed. Not the style "something if cond". It means it will only execute "something" if the condition is true. Finally, "/$pattern/", alone, is the same as "$_ =~ m/$pattern/", which means compare $_ with the regex pattern indicated. If you want standard grep behavior, ie, just substring matching, you could write:

print $_ if $_ =~ "$pattern";

CUT

Usually, you do better using regex groups to get the exact string than cut. What you would do with "sed", for instance. Anyway, here are two ways of reproducing cut:

while(<>) {
  my @array = split ",";
  print $array[3], "\n";
}

That will get you the fourth column of every line, using "," as separator. Note @array and $array[3]. The @ sigil means "array" should be treated as an, well, array. It will receive an array composed of each column in the currently processed line. Next, the $ sigil means array[3] is a scalar value. It will return the column you are asking for.

This is not a good implementation, though, as "split" will scan the whole string. I once reduced a process from 30 minutes to 2 seconds just by not using split -- the lines where rather large, though. Anyway, the following has a superior performance if the lines are expected to be big, and the columns you want are low:

while(<>) {
  my ($column) = /^(?:[^,]*,){3}([^,]*),/;
  print $column, "\n";
}

This leverages regular expressions to get the desired information, and only that.

If you want positional columns, you can use:

while(<>) {
  print substr($_, 5, 10), "\n";
}

Which will print 10 characters starting from the sixth (again, 0 means the first character).

HEAD

This one is pretty simple:

my $printlines = abs(shift);
my $lines = 0;
my $current;
while(<>) {
  if($ARGV ne $current) {
    $lines = 0;
    $current = $ARGV;
  }
  print "$_" if $lines < $printlines;
  $lines++;
}

Things to note here. I use "ne" to compare strings. Now, $ARGV will always point to the current file, being read, so I keep track of them to restart my counting once I'm reading a new file. Also note the more traditional syntax for "if", right along with the post-fixed one.

I also use a simplified syntax to get the number of lines to be printed. When you use "shift" by itself it will assume "shift @ARGV". Also, note that shift, besides modifying @ARGV, will return the element that was shifted out of it.

As with a shell, there is no distinction between a number and a string -- you just use it. Even things like "2"+"2" will work. In fact, Perl is even more lenient, cheerfully treating anything non-number as a 0, so you might want to be careful there.

This script is very inefficient, though, as it reads ALL file, not only the required lines. Let's improve it, and see a couple of important keywords in the process:

my $printlines = abs(shift);
my @files;
if(scalar(@ARGV) == 0) {
  @files = ("-");
} else {
  @files = @ARGV;
}
for my $file (@files) {
  next unless -f $file && -r $file;
  open FILE, "<", $file or next;
  my $lines = 0;

  while(<FILE>) {
    last if $lines == $printlines;
    print "$_";
    $lines++;
  }

  close FILE;
}

The keywords "next" and "last" are very useful. First, "next" will tell Perl to go back to the loop condition, getting the next element if applicable. Here we use it to skip a file unless it is truly a file (not a directory) and readable. It will also skip if we couldn't open the file even then.

Then "last" is used to immediately jump out of a loop. We use it to stop reading the file once we have reached the required number of lines. It's true we read one line too many, but having "last" in that position shows clearly that the lines after it won't be executed.

There is also "redo", which will go back to the beginning of the loop, but without reevaluating the condition nor getting the next element.

TAIL

I'll do a little trick here.

my $skiplines = abs(shift);
my @lines;
my $current = "";
while(<>) {
  if($ARGV ne $current) {
    print @lines;
    undef @lines;
    $current = $ARGV;
  }
  push @lines, $_;
  shift @lines if $#lines == $skiplines;
}
print @lines;

Ok, I'm combining "push", which appends a value to an array, with "shift", which takes something from the beginning of an array. If you want a stack, you can use push/pop or shift/unshift. Mix them, and you have a queue. I keep my queue with at most 10 elements with $#lines which will give me the index of the last element in the array. You could also get the number of elements in @lines with scalar(@lines).

UNIQ

Now, uniq only eliminates repeated consecutive lines, which should be easy with what you have seen so far. So I'll eliminate all of them:

my $current = "";
my %lines;
while(<>) {
  if($ARGV ne $current) {
    undef %lines;
    $current = $ARGV;
  }
  print $_ unless defined($lines{$_});
  $lines{$_} = "";
}

Now here I'm keeping the whole file in memory, inside %lines. The use of the % sigil indicates this is a hash table. I'm using the lines as keys, and storing nothing as value -- as I have no interest in the values. I check where the key exist with "defined($lines{$_})", which will test if the value associated with that key is defined or not; the keyword "unless" works just like "if", but with the opposite effect, so it only prints a line if the line is NOT defined.

Note, too, the syntax $lines{$_} = "" as a way to store something in a hash table. Note the use of {} for hash table, as opposed to [] for arrays.

WC

This will actually use a lot of stuff we have seen:

my $current;
my %lines;
my %words;
my %chars;
while(<>) {
  $lines{"$ARGV"}++;
  $chars{"$ARGV"} += length($_);
  $words{"$ARGV"} += scalar(grep {$_ ne ""} split /\s/);
}

for my $file (keys %lines) {
  print "$lines{$file} $words{$file} $chars{$file} $file\n";
}

Three new things. Two are the "+=" operator, which should be obvious, and the "for" expression. Basically, a "for" will assign each element of the array to the variable indicated. The "my" is there to declare the variable, though it's unneeded if declared previously. I could have an @array variable inside those parenthesis. The "keys %lines" expression will return as an array they keys (the filenames) which exist for the hash table "%lines". The rest should be obvious.

The third thing, which I actually added only revising the answer, is the "grep". The format here is:

grep { code } array

It will run "code" for each element of the array, passing the element as "$_". Then grep will return all elements for which the code evaluates to "true" (not 0, not "", etc). This avoids counting empty strings resulting from consecutive spaces.

Similar to "grep" there is "map", which I won't demonstrate here. Instead of filtering, it will return an array formed by the results of "code" for each element.

SORT

Finally, sort. This one is easy too:

my @lines;
my $current = "";
while(<>) {
  if($ARGV ne $current) {
    print sort @lines;
    undef @lines;
    $current = $ARGV;
  }
  push @lines, $_;
}
print sort @lines;

Here, "sort" will sort the array. Note that sort can receive a function to define the sorting criteria. For instance, if I wanted to sort numbers I could do this:

my @lines;
my $current = "";
while(<>) {
  if($ARGV ne $current) {
    print sort @lines;
    undef @lines;
    $current = $ARGV;
  }
  push @lines, $_;
}
print sort {$a <=> $b} @lines;

Here "$a" and "$b" receive the elements to be compared. "<=>" returns -1, 0 or 1 depending on whether the number is less than, equal to or greater than the other. For strings, "cmp" does the same thing.

HANDLING FILES, DIRECTORIES & OTHER STUFF

As for the rest, basic mathematical expressions should be easy to understand. You can test certain conditions about files this way:

for my $file (@ARGV) {
  print "$file is a file\n" if -f "$file";
  print "$file is a directory\n" if -d "$file";
  print "I can read $file\n" if -r "$file";
  print "I can write to $file\n" if -w "$file";
}

I'm not trying to be exaustive here, there are many other such tests. I can also do "glob" patterns, like shell's "*" and "?", like this:

for my $file (glob("*")) {
  print $file;
  print "*" if -x "$file" && ! -d "$file";
  print "/" if -d "$file";
  print "\t";
}

If you combined that with "chdir", you can emulate "find" as well:

sub list_dir($$) {
  my ($dir, $prefix) = @_;
  my $newprefix = $prefix;
  if ($prefix eq "") {
    $newprefix = $dir;
  } else {
    $newprefix .= "/$dir";
  }
  chdir $dir;
  for my $file (glob("*")) {
    print "$prefix/" if $prefix ne "";
    print "$dir/$file\n";
    list_dir($file, $newprefix) if -d "$file";
  }
  chdir "..";
}

list_dir(".", "");

Here we see, finally, a function. A function is declared with the syntax:

sub name (params) { code }

Strictly speakings, "(params)" is optional. The declared parameter I used, "($$)", means I'm receiving two scalar parameters. I could have "@" or "%" in there as well. The array "@_" has all the parameters passed. The line "my ($dir, $prefix) = @_" is just a simple way of assigning the first two elements of that array to the variables $dir and $prefix.

This function does not return anything (it's a procedure, really), but you can have functions which return values just by adding "return something;" to it, and have it return "something".

The rest of it should be pretty obvious.

MIXING EVERYTHING

Now I'll present a more involved example. I'll show some bad code to explain what's wrong with it, and then show better code.

For this first example, I have two files, the names.txt file, which names and phone numbers, the systems.txt, with systems and the name of the responsible for them. Here they are:

names.txt

John Doe, (555) 1234-4321
Jane Doe, (555) 5555-5555
The Boss, (666) 5555-5555

systems.txt

Sales, Jane Doe
Inventory, John Doe
Payment, That Guy

I want, then, to print the first file, with the system appended to the name of the person, if that person is responsible for that system. The first version might look like this:

#!/usr/bin/perl

use strict;
use warnings;

open FILE, "names.txt";

while(<FILE>) {
  my ($name) = /^([^,]*),/;
  my $system = get_system($name);
  print $_ . ", $system\n";
}

close FILE;

sub get_system($) {
  my ($name) = @_;
  my $system = "";

  open FILE, "systems.txt";

  while(<FILE>) {
    next unless /$name/o;
    ($system) = /([^,]*)/;
  }

  close FILE;

  return $system;
}

This code won't work, though. Perl will complain that the function was used too early for the prototype to be checked, but that's just a warning. It will give an error on line 8 (the first while loop), complaining about a readline on a closed filehandle. What happened here is that "FILE" is global, so the function get_system is changing it. Let's rewrite it, fixing both things:

#!/usr/bin/perl

use strict;
use warnings;

sub get_system($) {
  my ($name) = @_;
  my $system = "";

  open my $filehandle, "systems.txt";

  while(<$filehandle>) {
    next unless /$name/o;
    ($system) = /([^,]*)/;
  }

  close $filehandle;

  return $system;
}

open FILE, "names.txt";

while(<FILE>) {
  my ($name) = /^([^,]*),/;
  my $system = get_system($name);
  print $_ . ", $system\n";
}

close FILE;

This won't give any error or warnings, nor will it work. It returns just the sysems, but not the names and phone numbers! What happened? Well, what happened is that we are making a reference to "$_" after calling get_system, but, by reading the file, get_system is overwriting the value of $_!

To avoid that, we'll make $_ local inside get_system. This will give it a local scope, and the original value will then be restored once returned from get_system:

#!/usr/bin/perl

use strict;
use warnings;

sub get_system($) {
  my ($name) = @_;
  my $system = "";
  local $_;

  open my $filehandle, "systems.txt";

  while(<$filehandle>) {
    next unless /$name/o;
    ($system) = /([^,]*)/;
  }

  close $filehandle;

  return $system;
}

open FILE, "names.txt";

while(<FILE>) {
  my ($name) = /^([^,]*),/;
  my $system = get_system($name);
  print $_ . ", $system\n";
}

close FILE;

And that still doesn't work! It prints a newline between the name and the system. Well, Perl reads the line including any newline it might have. There is a neat command which will remove newlines from strings, "chomp", which we'll use to fix this problem. And since not every name has a system, we might, as well, avoid printing the comma when that happens:

#!/usr/bin/perl

use strict;
use warnings;

sub get_system($) {
  my ($name) = @_;
  my $system = "";
  local $_;

  open my $filehandle, "systems.txt";

  while(<$filehandle>) {
    next unless /$name/o;
    ($system) = /([^,]*)/;
  }

  close $filehandle;

  return $system;
}

open FILE, "names.txt";

while(<FILE>) {
  my ($name) = /^([^,]*),/;
  my $system = get_system($name);
  chomp;
  print $_;
  print ", $system" if $system ne "";
  print "\n";
}

close FILE;

That works, but it also happens to be horribly inefficient. We read the whole systems file for every line in the names file. To avoid that, we'll read all data from systems once, and then use that to process names.

Now, sometimes a file is so big you can't read it into memory. When that happens, you should try to read into memory any other file needed to process it, so that you can do everything in a single pass for each file. Anyway, here is the first optimized version of it:

#!/usr/bin/perl

use strict;
use warnings;

our %systems;
open SYSTEMS, "systems.txt";
while(<SYSTEMS>) {
  my ($system, $name) = /([^,]*),(.*)/;
  $systems{$name} = $system;
}
close SYSTEMS;

open NAMES, "names.txt";
while(<NAMES>) {
  my ($name) = /^([^,]*),/;
  chomp;
  print $_;
  print ", $systems{$name}" if defined $systems{$name};
  print "\n";
}
close NAMES;

Unfortunately, it doesn't work. No system ever appears! What has happened? Well, let's look into what "%systems" contains, by using Data::Dumper:

#!/usr/bin/perl

use strict;
use warnings;

use Data::Dumper;

our %systems;
open SYSTEMS, "systems.txt";
while(<SYSTEMS>) {
  my ($system, $name) = /([^,]*),(.*)/;
  $systems{$name} = $system;
}
close SYSTEMS;

print Dumper(%systems);

open NAMES, "names.txt";
while(<NAMES>) {
  my ($name) = /^([^,]*),/;
  chomp;
  print $_;
  print ", $systems{$name}" if defined $systems{$name};
  print "\n";
}
close NAMES;

The output will be something like this:

$VAR1 = ' Jane Doe';
$VAR2 = 'Sales';
$VAR3 = ' That Guy';
$VAR4 = 'Payment';
$VAR5 = ' John Doe';
$VAR6 = 'Inventory';
John Doe, (555) 1234-4321
Jane Doe, (555) 5555-5555
The Boss, (666) 5555-5555

Those $VAR1/$VAR2/etc is how Dumper displays a hash table. The odd numbers are the keys, and the succeeding even numbers are the values. Now we can see that each name in %systems has a preceeding space! Silly regex mistake, let's fix it:

#!/usr/bin/perl

use strict;
use warnings;

our %systems;
open SYSTEMS, "systems.txt";
while(<SYSTEMS>) {
  my ($system, $name) = /^\s*([^,]*?)\s*,\s*(.*?)\s*$/;
  $systems{$name} = $system;
}
close SYSTEMS;

open NAMES, "names.txt";
while(<NAMES>) {
  my ($name) = /^\s*([^,]*?)\s*,/;
  chomp;
  print $_;
  print ", $systems{$name}" if defined $systems{$name};
  print "\n";
}
close NAMES;

So, here, we are aggressively removing any spaces from the beginning or end of name and system. There are other ways to form that regex, but that's beside the point. There is still one problem with this script, which you'll have seen if your "names.txt" and/or "systems.txt" files have an empty line at the end. The warnings look like this:

Use of uninitialized value in hash element at ./exemplo3e.pl line 10, <SYSTEMS> line 4.
Use of uninitialized value in hash element at ./exemplo3e.pl line 10, <SYSTEMS> line 4.
John Doe, (555) 1234-4321, Inventory
Jane Doe, (555) 5555-5555, Sales
The Boss, (666) 5555-5555
Use of uninitialized value in hash element at ./exemplo3e.pl line 19, <NAMES> line 4.

What happened here is that nothing went into the "$name" variable when the empty line was processed. There are many ways around that, but I choose the following:

#!/usr/bin/perl

use strict;
use warnings;

our %systems;
open SYSTEMS, "systems.txt" or die "Could not open systems.txt!";
while(<SYSTEMS>) {
  my ($system, $name) = /^\s*([^,]+?)\s*,\s*(.+?)\s*$/;
  $systems{$name} = $system if defined $name;
}
close SYSTEMS;

open NAMES, "names.txt" or die "Could not open names.txt!";
while(<NAMES>) {
  my ($name) = /^\s*([^,]+?)\s*,/;
  chomp;
  print $_;
  print ", $systems{$name}" if defined($name) && defined($systems{$name});
  print "\n";
}
close NAMES;

The regular expressions now require at least one character for name and system, and we test to see if "$name" is defined before we use it.

CONCLUSION

Well, then, these are the basic tools to translate a shell script. You can do MUCH more with Perl, but that was not your question, and it wouldn't fit here anyway.

Just as a basic overview of some important topics,

  • A Perl script that might be attacked by hackers need to be run with the -T option, so that Perl will complain about any vulnerable input which has not been properly handled.

  • There are libraries, called modules, for database accesses, XML&cia handling, Telnet, HTTP & other protocols. In fact, there are miriads of modules which can be found at CPAN.

  • As mentioned by someone else, if you make use of AWK or SED, you can translate those into Perl with A2P and S2P.

  • Perl can be written in an Object Oriented way.

  • There are multiple versions of Perl. As of this writing, the stable one is 5.8.8 and there is a 5.10.0 available. There is also a Perl 6 in development, but experience has taught everyone not to wait too eagerly for it.

There is a free, good, hands-on, hard & fast book about Perl called Learning Perl The Hard Way. It's style is similar to this very answer. It might be a good place to go from here.

I hope this helped.

DISCLAIMER

I'm NOT trying to teach Perl, and you will need to have at least some reference material. There are guidelines to good Perl habits, such as using "use strict;" and "use warnings;" at the beginning of the script, to make it less lenient of badly written code, or using STDOUT and STDERR on the print lines, to indicate the correct output pipe.

This is stuff I agree with, but I decided it would detract from the basic goal of showing patterns for common shell script utilities.

这篇关于如何将 shell 脚本转换为 Perl?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆