如何在 Perl 中将文件的多行读入块中? [英] How can I read multiple lines of a file into blocks in Perl?

查看：59 发布时间：2021/6/15 20:30:47 perl

本文介绍了如何在 Perl 中将文件的多行读入块中?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个包含以下文本的文件.

I have a file which contains the text below.

#L_ENTRY    <s_slash_1>
#LEX        </>
#ROOT       </>
#POS        <sp>
#SUBCAT     <slash>
#S_LINK           <>
#BITS    <>
#WEIGHT      <0.1>
#SYNONYM     <0>

#L_ENTRY    <s_comma_1>
#LEX        <,>
#ROOT       <,>
#POS        <sp>
#SUBCAT     <comma>
#S_LINK           <>
#BITS    <>
#WEIGHT      <0.1>
#SYNONYM     <0>

#L_ENTRY    <s_tilde_1>
#LEX        <~>
#ROOT       <~>
#POS        <sp>
#SUBCAT     <tilde>
#S_LINK           <>
#BITS    <>
#WEIGHT      <0.1>
#SYNONYM     <0>

#L_ENTRY    <s_at_1>
#LEX        <@>
#ROOT       <@>
#POS        <sp>
#SUBCAT     <at>
#S_LINK           <>
#BITS    <>
#WEIGHT      <0.1>
#SYNONYM     <0>

我知道如何使用 Perl 将这些行组成一个数组，但在这种情况下，我想创建一个包含两个元素的数组.每个以 #L_ENTRY 开头并以 #SYNONYM <0> 结尾.

I know how to make the lines into an array using Perl, but in this case I want to make an array with two elements. Each that begins with #L_ENTRY and ends with #SYNONYM <0>.

有人可以帮忙吗?

推荐答案

有两种方法可以做到.首先，您可以设置输入记录分隔符"特殊变量(查看更多这里).简而言之，您是在告诉 perl 一行不是由换行符终止的.在您的情况下，您可以将其设置为#SYNONYM <0>".然后，当您阅读一行时，您将获得具有该标签的文件中该点的所有内容 - 如果该标签不存在，那么您将获得文件中剩余的内容.所以，对于看起来像这样的输入数据；

There are two ways to do it. Firstly, you can set the "input record separator" special variable (see more here). In short, you are telling perl that a line is not terminated by a new-line char. In your case, you could set it to '#SYNONYM <0>'. Then when you read in one line, you get everything up to that point in the file that has that tag - if the tag is not there, then you get what's left in the file. So, for input data that looks like this;

#L_ENTRY        <s_slash_1>
#LEX         </>
#ROOT        </>
#POS         <sp>
#SUBCAT      <slash>
#S_LINK            <>
#BITS     <>
#WEIGHT      <0.1>
#SYNONYM     <0>

#L_ENTRY        <s_comma_1>
#LEX         <,>
#ROOT        <,>
#POS         <sp>
#SUBCAT      <comma>
#S_LINK            <>
#BITS     <>
#WEIGHT      <0.1>
#SYNONYM     <0>

如果你运行这个；

use v5.14;
use warnings;

my $filename = "data.txt" ;
open(my $fh, '<', $filename) or die "$filename: $!" ;
local $/ = "#SYNONYM     <0>\n" ;
my @chunks = <$fh> ;
say $chunks[0] ;
say '---' ;
say $chunks[1] ;

你得到;

#L_ENTRY        <s_slash_1>
#LEX         </>
#ROOT        </>
#POS         <sp>
#SUBCAT      <slash>
#S_LINK            <>
#BITS     <>
#WEIGHT      <0.1>
#SYNONYM     <0>

---

#L_ENTRY        <s_comma_1>
#LEX         <,>
#ROOT        <,>
#POS         <sp>
#SUBCAT      <comma>
#S_LINK            <>
#BITS     <>
#WEIGHT      <0.1>
#SYNONYM     <0>

关于此的一些注意事项；

A couple of notes about this;

记录之间的任何额外数据都将陷入网络"并最终出现在每条记录的开头；
记录分隔符本身仍然是数据的一部分，位于每条记录的末尾.

为了获得更多控制，最好逐行处理数据并使用正则表达式在捕获"模式和不捕获"模式之间切换:

To get more control, it's better to process the data line-by-line and use regexs to switch between "capture" mode and "dont capture" mode:

use v5.14;
use warnings;

my $filename = "data.txt" ;
open(my $fh, '<', $filename) or die "$filename: $!" ;

my $found_start_token = qr/ \s* \#L_ENTRY \s* /x;
my $found_stop_token  = qr/ \s* \#SYNONYM \s+ \<0\> \s* \n /x;

my @chunks ;
my $chunk  ;
my $capture_mode = 0 ;

while ( <$fh> )  {
    $capture_mode = 1 if /$found_start_token/ ;
    $chunk .= $_ if $capture_mode ;
    if (/$found_stop_token/) {
        push @chunks, $chunk ;
        $chunk = '' ;
        $capture_mode = 0 ;
    }
}
say $chunks[0] ;
say '---' ;
say $chunks[1] ;
exit 0

一些注意事项；

如果我们处于捕获模式，程序通过将当前行 $_ 的字符串连接到 $chunk 来工作.
在扩展模式"下使用正则表达式关闭和打开捕获模式，/x.这允许向正则表达式添加空格以便于阅读.
记录之间的额外数据不会出现在块中.
它产生与以前相同的输出.

The program works by string concatenation of the current line, $_, on to $chunk if we're in caputure mode.
Capture mode is turned off and on using regexs in 'extended mode', /x. This allows adding whitespace to the regex for easier reading.
Extra data between record will not appear in the chunks.
It produces the same output as before.

这篇关于如何在 Perl 中将文件的多行读入块中?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在 Perl 中将文件的多行读入块中? [英] How can I read multiple lines of a file into blocks in Perl?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何在 Perl 中将文件的多行读入块中? [英] How can I read multiple lines of a file into blocks in Perl?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭