如何在 Perl 中将文件的多行读入块中? [英] How can I read multiple lines of a file into blocks in Perl?

查看:59
本文介绍了如何在 Perl 中将文件的多行读入块中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含以下文本的文件.

I have a file which contains the text below.

#L_ENTRY    <s_slash_1>
#LEX        </>
#ROOT       </>
#POS        <sp>
#SUBCAT     <slash>
#S_LINK           <>
#BITS    <>
#WEIGHT      <0.1>
#SYNONYM     <0>

#L_ENTRY    <s_comma_1>
#LEX        <,>
#ROOT       <,>
#POS        <sp>
#SUBCAT     <comma>
#S_LINK           <>
#BITS    <>
#WEIGHT      <0.1>
#SYNONYM     <0>

#L_ENTRY    <s_tilde_1>
#LEX        <~>
#ROOT       <~>
#POS        <sp>
#SUBCAT     <tilde>
#S_LINK           <>
#BITS    <>
#WEIGHT      <0.1>
#SYNONYM     <0>

#L_ENTRY    <s_at_1>
#LEX        <@>
#ROOT       <@>
#POS        <sp>
#SUBCAT     <at>
#S_LINK           <>
#BITS    <>
#WEIGHT      <0.1>
#SYNONYM     <0>

我知道如何使用 Perl 将这些行组成一个数组,但在这种情况下,我想创建一个包含两个元素的数组.每个以 #L_ENTRY 开头并以 #SYNONYM <0> 结尾.

I know how to make the lines into an array using Perl, but in this case I want to make an array with two elements. Each that begins with #L_ENTRY and ends with #SYNONYM <0>.

有人可以帮忙吗?

推荐答案

有两种方法可以做到.首先,您可以设置输入记录分隔符"特殊变量(查看更多这里).简而言之,您是在告诉 perl 一行不是由换行符终止的.在您的情况下,您可以将其设置为#SYNONYM <0>".然后,当您阅读一行时,您将获得具有该标签的文件中该点的所有内容 - 如果该标签不存在,那么您将获得文件中剩余的内容.所以,对于看起来像这样的输入数据;

There are two ways to do it. Firstly, you can set the "input record separator" special variable (see more here). In short, you are telling perl that a line is not terminated by a new-line char. In your case, you could set it to '#SYNONYM <0>'. Then when you read in one line, you get everything up to that point in the file that has that tag - if the tag is not there, then you get what's left in the file. So, for input data that looks like this;

#L_ENTRY        <s_slash_1>
#LEX         </>
#ROOT        </>
#POS         <sp>
#SUBCAT      <slash>
#S_LINK            <>
#BITS     <>
#WEIGHT      <0.1>
#SYNONYM     <0>

#L_ENTRY        <s_comma_1>
#LEX         <,>
#ROOT        <,>
#POS         <sp>
#SUBCAT      <comma>
#S_LINK            <>
#BITS     <>
#WEIGHT      <0.1>
#SYNONYM     <0>

如果你运行这个;

use v5.14;
use warnings;

my $filename = "data.txt" ;
open(my $fh, '<', $filename) or die "$filename: $!" ;
local $/ = "#SYNONYM     <0>\n" ;
my @chunks = <$fh> ;
say $chunks[0] ;
say '---' ;
say $chunks[1] ;

你得到;

#L_ENTRY        <s_slash_1>
#LEX         </>
#ROOT        </>
#POS         <sp>
#SUBCAT      <slash>
#S_LINK            <>
#BITS     <>
#WEIGHT      <0.1>
#SYNONYM     <0>

---

#L_ENTRY        <s_comma_1>
#LEX         <,>
#ROOT        <,>
#POS         <sp>
#SUBCAT      <comma>
#S_LINK            <>
#BITS     <>
#WEIGHT      <0.1>
#SYNONYM     <0>

关于此的一些注意事项;

A couple of notes about this;

  1. 记录之间的任何额外数据都将陷入网络"并最终出现在每条记录的开头;
  2. 记录分隔符本身仍然是数据的一部分,位于每条记录的末尾.

为了获得更多控制,最好逐行处理数据并使用正则表达式在捕获"模式和不捕获"模式之间切换:

To get more control, it's better to process the data line-by-line and use regexs to switch between "capture" mode and "dont capture" mode:

use v5.14;
use warnings;

my $filename = "data.txt" ;
open(my $fh, '<', $filename) or die "$filename: $!" ;

my $found_start_token = qr/ \s* \#L_ENTRY \s* /x;
my $found_stop_token  = qr/ \s* \#SYNONYM \s+ \<0\> \s* \n /x;

my @chunks ;
my $chunk  ;
my $capture_mode = 0 ;

while ( <$fh> )  {
    $capture_mode = 1 if /$found_start_token/ ;
    $chunk .= $_ if $capture_mode ;
    if (/$found_stop_token/) {
        push @chunks, $chunk ;
        $chunk = '' ;
        $capture_mode = 0 ;
    }
}
say $chunks[0] ;
say '---' ;
say $chunks[1] ;
exit 0

一些注意事项;

  1. 如果我们处于捕获模式,程序通过将当前行 $_ 的字符串连接到 $chunk 来工作.
  2. 在扩展模式"下使用正则表达式关闭和打开捕获模式,/x.这允许向正则表达式添加空格以便于阅读.
  3. 记录之间的额外数据不会出现在块中.
  4. 它产生与以前相同的输出.
  1. The program works by string concatenation of the current line, $_, on to $chunk if we're in caputure mode.
  2. Capture mode is turned off and on using regexs in 'extended mode', /x. This allows adding whitespace to the regex for easier reading.
  3. Extra data between record will not appear in the chunks.
  4. It produces the same output as before.

这篇关于如何在 Perl 中将文件的多行读入块中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆