查找正则表达式，将下一行移至该行的末尾，然后将前5列复制到以字母开头的下一行 [英] Find regex, move the next line at the end of this line and copy the first 5 columns to the next lines that start with a letter

查看：98 发布时间：2020/9/15 8:03:20 perl awk sed text-processing

本文介绍了查找正则表达式，将下一行移至该行的末尾，然后将前5列复制到以字母开头的下一行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有这样的文字:

37    7    --------------  No  aaa
40    0    --------------  No  bbb
xxx   zzy
aa    bb   cc
42    2    --------------  No  ccc
xxx   zyz
a     b    c               d
43    3    --------------  No  ddd
xy    zz
a     a
a     a
c
52    5    --------------  No  eee
yyyx  zzz

当我用awk处理它时，我得到:

awk '{if($1+0==$1) p=$1 FS $2 FS $3 FS $4 FS $5; else $0=p FS $0}1' /tmp/test3 | column -t
37  7  --------------  No  aaa
37  7  --------------  No  aaa  xxx   zzz
40  0  --------------  No  bbb
40  0  --------------  No  bbb  xxx   zzy
40  0  --------------  No  bbb  aa    bb   cc
42  2  --------------  No  ccc
42  2  --------------  No  ccc  xxx   zyz
42  2  --------------  No  ccc  a     b    c   d
43  3  --------------  No  ddd
43  3  --------------  No  ddd  xy    zz
43  3  --------------  No  ddd  a     a
43  3  --------------  No  ddd  a     a
43  3  --------------  No  ddd  c
52  5  --------------  No  eee
52  5  --------------  No  eee  yyyx  zzz

我需要获得以下输出:

37    7    --------------  No  aaa
40    0    --------------  No  bbb xxx   zzy
40    0    --------------  No  bbb aa    bb   cc
42    2    --------------  No  ccc xxx   zyz
42    2    --------------  No  ccc a     b    c  d
43    3    --------------  No  ddd xy    zz
43    3    --------------  No  ddd a     a
43    3    --------------  No  ddd a     a
43    3    --------------  No  ddd c
52    5    --------------  No  eee yyyx  zzz

在此先感谢您的帮助！我也尝试过按照建议awk '/-/{base=$0; next} {print base, $0}' /tmp/test4 | column -t，但是如果连续的行以数字开头，则会删除以数字开头的第一行.

更新

这个sed咒语解决了我的问题: sed -r':a; N;/^ [0-9]. \ n [0-9]/{P; D} ;: b; s/^(.)\ n( .)/\ 1 \ 2 \ n \ 1/; P; s/. \ n//; $ d; N;/\ n [0-9]/D; bb'/tmp/test2

另一个问题:如果我在输出行中有超过8列，是否可以修改sed命令，以便将第9、10和11列移至新行并在其之前复制前5列?/p>

假设我有以下3行:

42    2    --------------  No  ccc xxx   zyz
42    2    --------------  No  ccc a     b    c    d    e    f
43    3    --------------  No  ddd xy    zz

我想得到:

42    2    --------------  No  ccc xxx   zyz
42    2    --------------  No  ccc a     b    c
42    2    --------------  No  ccc d     e    f
43    3    --------------  No  ddd xy    zz

解决方案

下面的Perl脚本假定满足以下要求.

输入包含以数字或非数字开头的交替行块，其中每个数字行块后跟一个文本行块. 已更新:对于输出，需要将最后一个数字行中其块的前五列添加到紧随其后的文本块中的每行文本之前.其他文本行将按原样打印.

该代码在其缓冲区中收集数字行和文本行.一旦我们到达下一个数字行块的第一行，即两个缓冲区均为非空时，它们便被处理并清空.

use warnings;
use strict;
use feature 'say';

my $file = shift @ARGV || 'default_filename.txt';
die "Usage: $0 file\n" if not $file;

open my $fh, '<', $file or die "Can't open $file: $!";

my (@text, @nums);

while (my $line = <$fh>) {
    chomp $line;
    if ($line =~ /^[^0-9]/) { 
        push @text, $line;
        if (eof) {
            process_buffers(\@nums, \@text);
            last
        }
        next;
    }
    elsif (@nums and @text) {
        process_buffers(\@nums, \@text);
    }

    push @nums, $line;
}

sub process_buffers {
    my ($rnums, $rtext) = @_;

    # Remove last number line from array and take its first five columns
    my @last_num_line_cols = (split ' ', pop @$rnums)[0..4];
    # Print other number lines; all consecutive spaces replaced by tabs
    say for map { s/\s+/\t/gr } @$rnums;

    # Print text lines prepended by five columns of last number line
    foreach my $text_line (@$rtext) {
        say join "\t", @last_num_line_cols, $text_line;
    }   

    @$rtext = ();
    @$rnums = ();
}

需要使用上面涉及 eof 的条件来处理最后一批数字和文本块，因为没有其他测试可以在最后一行进行.它的位置假定最后一行必须是文本行，这是根据我对需求的假设得出的.

此打印

37      7       --------------  No      aaa
40      0       --------------  No      bbb     xxx   zzy
40      0       --------------  No      bbb     aa    bb   cc
42      2       --------------  No      ccc     xxx   zyz
42      2       --------------  No      ccc     a     b    c               d
43      3       --------------  No      ddd     xy    zz
43      3       --------------  No      ddd     a     a
43      3       --------------  No      ddd     a     a
43      3       --------------  No      ddd     c
52      5       --------------  No      eee     yyyx  zzz

(在标签上对齐，如输入所期望和输出所希望的那样)

更新如问题更新所述，将输出宽度限制为8列

使用此修改后的处理功能版本

sub process_buffers_fmt {
    my ($rnums, $rtext) = @_;

    my @last_num_line_cols = (split ' ', pop @$rnums)[0..4];
    say for map { s/\s+/\t/gr } @$rnums;

    # Format output lines to 8 columns at most
    foreach my $text_line (@$rtext) {
        my @text_cols = split ' ', $text_line;
        while (my @prn_text_cols = splice @text_cols, 0, 3) {
            say join "\t", @last_num_line_cols, @prn_text_cols;
        }    
    }
    @$rtext = ();
    @$rnums = ();
}

这使用 splice 一次删除文本输出的前三列并在最后一行数字的(五)列中进行打印.这是在while循环中完成的，因此一旦@text_cols被全部处理(打印)，它便会停止.

为了测试，我将以下内容添加到输入文件中43 3 ...数字行之后的文本块中

a b c d e f g h i j k

并且主程序的输出获取了这些额外的行

43      3       --------------  No      ddd     a       b       c
43      3       --------------  No      ddd     d       e       f
43      3       --------------  No      ddd     g       h       i
43      3       --------------  No      ddd     j       k

我用来测试所有要求和更新的输入文件是

37    7    --------------  No  aaa MORE COLUMNS
40    0    --------------  No  bbb
xxx   zzy
aa    bb   cc
42    2    --------------  No  ccc 
xxx   zyz
a     b    c               d
43    3    --------------  No  ddd  AND YET MORE
xy    zz
a     a 
a     a 
c
a b c d e f g h i j k
52    5    --------------  No  eee
yyyx  zzz

，程序的输出(带有process_buffers_fmt功能)是

37      7       --------------  No      aaa     MORE    COLUMNS
40      0       --------------  No      bbb     xxx     zzy
40      0       --------------  No      bbb     aa      bb      cc
42      2       --------------  No      ccc     xxx     zyz
42      2       --------------  No      ccc     a       b       c
42      2       --------------  No      ccc     d
43      3       --------------  No      ddd     xy      zz
43      3       --------------  No      ddd     a       a
43      3       --------------  No      ddd     a       a
43      3       --------------  No      ddd     c
43      3       --------------  No      ddd     a       b       c
43      3       --------------  No      ddd     d       e       f
43      3       --------------  No      ddd     g       h       i
43      3       --------------  No      ddd     j       k
52      5       --------------  No      eee     yyyx    zzz

I have such text:

37    7    --------------  No  aaa
40    0    --------------  No  bbb
xxx   zzy
aa    bb   cc
42    2    --------------  No  ccc
xxx   zyz
a     b    c               d
43    3    --------------  No  ddd
xy    zz
a     a
a     a
c
52    5    --------------  No  eee
yyyx  zzz

When I process it with awk I get:

awk '{if($1+0==$1) p=$1 FS $2 FS $3 FS $4 FS $5; else $0=p FS $0}1' /tmp/test3 | column -t
37  7  --------------  No  aaa
37  7  --------------  No  aaa  xxx   zzz
40  0  --------------  No  bbb
40  0  --------------  No  bbb  xxx   zzy
40  0  --------------  No  bbb  aa    bb   cc
42  2  --------------  No  ccc
42  2  --------------  No  ccc  xxx   zyz
42  2  --------------  No  ccc  a     b    c   d
43  3  --------------  No  ddd
43  3  --------------  No  ddd  xy    zz
43  3  --------------  No  ddd  a     a
43  3  --------------  No  ddd  a     a
43  3  --------------  No  ddd  c
52  5  --------------  No  eee
52  5  --------------  No  eee  yyyx  zzz

and I need to get following output:

37    7    --------------  No  aaa
40    0    --------------  No  bbb xxx   zzy
40    0    --------------  No  bbb aa    bb   cc
42    2    --------------  No  ccc xxx   zyz
42    2    --------------  No  ccc a     b    c  d
43    3    --------------  No  ddd xy    zz
43    3    --------------  No  ddd a     a
43    3    --------------  No  ddd a     a
43    3    --------------  No  ddd c
52    5    --------------  No  eee yyyx  zzz

Thanks in advance for your help! I've also tried awk '/-/{base=$0; next} {print base, $0}' /tmp/test4 | column -t as suggested but it deletes the first line starting with a number if there's consecutive line starting with a number.

UPDATE

This sed spell solved my problem: sed -r ':a;N;/^[0-9].\n[0-9]/{P;D};:b;s/^(.)\n(.)/\1 \2\n\1/;P;s/.\n//;$d;N;/\n[0-9]/D;bb' /tmp/test2

One more question: if I have more than 8 columns in the output line is there a way to modify the sed command so it moves 9th, 10th and 11th column to a new line and copy the first 5 columns before it?

Let's say I have these 3 lines:

42    2    --------------  No  ccc xxx   zyz
42    2    --------------  No  ccc a     b    c    d    e    f
43    3    --------------  No  ddd xy    zz

and I'd like to get:

42    2    --------------  No  ccc xxx   zyz
42    2    --------------  No  ccc a     b    c
42    2    --------------  No  ccc d     e    f
43    3    --------------  No  ddd xy    zz

解决方案

The Perl script below assumes the following requirements.

The input contains alternating blocks of lines starting with either a number or non-number, where each block of number-lines is followed by a block of text-lines. Updated: For the output the first five columns from the last number-line from its block need be prepended to each of the text-lines from the immediately following text-block. Other text-lines are printed as they are.

The code collects number and text lines in their buffers. They are processed and emptied once we get to the first line of the next number-lines block, which is when both buffers are non-empty.

use warnings;
use strict;
use feature 'say';

my $file = shift @ARGV || 'default_filename.txt';
die "Usage: $0 file\n" if not $file;

open my $fh, '<', $file or die "Can't open $file: $!";

my (@text, @nums);

while (my $line = <$fh>) {
    chomp $line;
    if ($line =~ /^[^0-9]/) { 
        push @text, $line;
        if (eof) {
            process_buffers(\@nums, \@text);
            last
        }
        next;
    }
    elsif (@nums and @text) {
        process_buffers(\@nums, \@text);
    }

    push @nums, $line;
}

sub process_buffers {
    my ($rnums, $rtext) = @_;

    # Remove last number line from array and take its first five columns
    my @last_num_line_cols = (split ' ', pop @$rnums)[0..4];
    # Print other number lines; all consecutive spaces replaced by tabs
    say for map { s/\s+/\t/gr } @$rnums;

    # Print text lines prepended by five columns of last number line
    foreach my $text_line (@$rtext) {
        say join "\t", @last_num_line_cols, $text_line;
    }   

    @$rtext = ();
    @$rnums = ();
}

The condition involving eof above is needed to process the last batch of number and text blocks, since no other test can work on the last line. Its placement assumes that the last line must be a text-line, what follows from my assumption of requirements.

This prints

37      7       --------------  No      aaa
40      0       --------------  No      bbb     xxx   zzy
40      0       --------------  No      bbb     aa    bb   cc
42      2       --------------  No      ccc     xxx   zyz
42      2       --------------  No      ccc     a     b    c               d
43      3       --------------  No      ddd     xy    zz
43      3       --------------  No      ddd     a     a
43      3       --------------  No      ddd     a     a
43      3       --------------  No      ddd     c
52      5       --------------  No      eee     yyyx  zzz

(aligned on tabs, as expected in input and wanted in output)

Update Limit output width to 8 columns, as described in the question update

Use this modified version of the processing function

sub process_buffers_fmt {
    my ($rnums, $rtext) = @_;

    my @last_num_line_cols = (split ' ', pop @$rnums)[0..4];
    say for map { s/\s+/\t/gr } @$rnums;

    # Format output lines to 8 columns at most
    foreach my $text_line (@$rtext) {
        my @text_cols = split ' ', $text_line;
        while (my @prn_text_cols = splice @text_cols, 0, 3) {
            say join "\t", @last_num_line_cols, @prn_text_cols;
        }    
    }
    @$rtext = ();
    @$rnums = ();
}

This uses splice to remove the first three columns of text output at a time and print them with the (five) columns of the last number line. This is done in a while loop so it stops once @text_cols is all processed (printed).

To test I add to the text block after the 43 3 ... number line in the input file the following

a b c d e f g h i j k

and the output of the main program acquires these extra lines

43      3       --------------  No      ddd     a       b       c
43      3       --------------  No      ddd     d       e       f
43      3       --------------  No      ddd     g       h       i
43      3       --------------  No      ddd     j       k

The input file that I use to test all requirements and updates is

37    7    --------------  No  aaa MORE COLUMNS
40    0    --------------  No  bbb
xxx   zzy
aa    bb   cc
42    2    --------------  No  ccc 
xxx   zyz
a     b    c               d
43    3    --------------  No  ddd  AND YET MORE
xy    zz
a     a 
a     a 
c
a b c d e f g h i j k
52    5    --------------  No  eee
yyyx  zzz

and the output of the program (with process_buffers_fmt function) is

37      7       --------------  No      aaa     MORE    COLUMNS
40      0       --------------  No      bbb     xxx     zzy
40      0       --------------  No      bbb     aa      bb      cc
42      2       --------------  No      ccc     xxx     zyz
42      2       --------------  No      ccc     a       b       c
42      2       --------------  No      ccc     d
43      3       --------------  No      ddd     xy      zz
43      3       --------------  No      ddd     a       a
43      3       --------------  No      ddd     a       a
43      3       --------------  No      ddd     c
43      3       --------------  No      ddd     a       b       c
43      3       --------------  No      ddd     d       e       f
43      3       --------------  No      ddd     g       h       i
43      3       --------------  No      ddd     j       k
52      5       --------------  No      eee     yyyx    zzz

这篇关于查找正则表达式，将下一行移至该行的末尾，然后将前5列复制到以字母开头的下一行的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

查找正则表达式，将下一行移至该行的末尾，然后将前5列复制到以字母开头的下一行 [英] Find regex, move the next line at the end of this line and copy the first 5 columns to the next lines that start with a letter

问题描述

更新

UPDATE

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

查找正则表达式，将下一行移至该行的末尾，然后将前5列复制到以字母开头的下一行 [英] Find regex, move the next line at the end of this line and copy the first 5 columns to the next lines that start with a letter

问题描述

更新

UPDATE

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭