用什么本地Perl代码代替`cut`? [英] What native Perl code replaces `cut`?

查看:132
本文介绍了用什么本地Perl代码代替`cut`?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在学习Perl,因为我正在编辑Perl脚本,以将Posix OS调用替换为本地Perl函数,以便在Windows上跨平台使用。这段代码让我感到困惑:

I'm learning Perl as I edit a Perl script to replace Posix OS calls to native Perl functions for cross-platform use on Windows. This code has me stumped:

if (defined($OPTIONS)) {
    my ($method,$file) = ($1,$2);
    my $count = `cut -d\\  -f 2 $file | sort | uniq | wc -l`;
}

1) $ 1 $ 2 来自何处?该代码位于函数内部,但该函数没有任何参数。此外,脚本本身会解析70多个命名参数,因此它们不会构成命令行。

1) Where do $1 and $2 come from? This code is inside a function but the function doesn't have any arguments. Also, script itself parses over 70 named arguments, so they're not form the command line.

2)因为我不知道是什么$ 2 是,我不确定 $ file 的内容。

2) Since I don't know what $2 is, I'm not sure of the content of $file.

3)不管 $ file 的内容是什么, cut 函数都会以反斜杠分隔第二个字段,每行。

3) Whatever the content of $file, the cut function looks at the second field, as delimited by a backslash, of each line.

4)最终结果似乎是任何 $ count > cut 找到。

4) It looks like the ultimate result is the $count of unique instances of whatever cut found.

考虑 $ file 可能会很大(百万行,数百兆字节),什么是替换此外部调用并获得相同的 $ count 值的最有效的本机Perl代码?同样有效是相对的。该代码在工具链中,其他阶段可以运行2或3天。因此,如果此代码在大文件上需要5或10分钟,这不是问题。

Considering $file could be quite large (million lines, multi-hundreds of megabytes), what is the most efficient native Perl code to replace this external call and get the same $count value? Also "efficient" is relative. This code is in a tool chain where other stages can run for 2 or 3 days. So, it's not a problem if this code requires 5 or 10 minutes on a large file.

推荐答案

$ 1 $ 2 等是内部Perl变量,用于保存第一个,第二个等内容。捕获最近成功的正则表达式模式匹配。

$1 $2 etc. are internal Perl variables that hold the contents of the first, second etc. captures from the most recent successful regex pattern match.

这应该做您想要的。它使用散列来跟踪第二列的所有唯一值,并在读取文件后将 $ count 设置为不同键的数量。它可能比同等的工具链要快一些。请注意,这是 unested ,因为我目前还没有Perl所在的系统。

This should do what you want. It uses a hash to keep track of all the unique values for the second column, and sets $count to the number of different keys when the file has been read through. It's likely to be slightly faster than the tool chain equivalent. Note that it's untested as I'm not near a system with Perl at present.

我希望此版本的实际版本中有更多内容代码,因为这样做唯一的作用就是更改了在块末尾丢弃的几个局部变量的值。

I hope there's something more in the real version of this code, as the only effect this has is to change the values of a couple of local variables which are discarded at the end of the block.

if ( defined $OPTIONS ) {
    my ($method, $file) = ($1, $2);
    open my $fh, '<', $file or die qq{Unable to open "$file" for input: $!};
    my %count;
    ++$count{ (split /\\/, $_, 3)[1] } while <$fh>;
    my $count = keys %count;
}

这篇关于用什么本地Perl代码代替`cut`?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆