查找正则表达式,将下一行移至该行的末尾,然后将前5列复制到以字母开头的下一行 [英] Find regex, move the next line at the end of this line and copy the first 5 columns to the next lines that start with a letter
问题描述
我有这样的文字:
37 7 -------------- No aaa
40 0 -------------- No bbb
xxx zzy
aa bb cc
42 2 -------------- No ccc
xxx zyz
a b c d
43 3 -------------- No ddd
xy zz
a a
a a
c
52 5 -------------- No eee
yyyx zzz
当我用awk处理它时,我得到:
awk '{if($1+0==$1) p=$1 FS $2 FS $3 FS $4 FS $5; else $0=p FS $0}1' /tmp/test3 | column -t
37 7 -------------- No aaa
37 7 -------------- No aaa xxx zzz
40 0 -------------- No bbb
40 0 -------------- No bbb xxx zzy
40 0 -------------- No bbb aa bb cc
42 2 -------------- No ccc
42 2 -------------- No ccc xxx zyz
42 2 -------------- No ccc a b c d
43 3 -------------- No ddd
43 3 -------------- No ddd xy zz
43 3 -------------- No ddd a a
43 3 -------------- No ddd a a
43 3 -------------- No ddd c
52 5 -------------- No eee
52 5 -------------- No eee yyyx zzz
我需要获得以下输出:
37 7 -------------- No aaa
40 0 -------------- No bbb xxx zzy
40 0 -------------- No bbb aa bb cc
42 2 -------------- No ccc xxx zyz
42 2 -------------- No ccc a b c d
43 3 -------------- No ddd xy zz
43 3 -------------- No ddd a a
43 3 -------------- No ddd a a
43 3 -------------- No ddd c
52 5 -------------- No eee yyyx zzz
在此先感谢您的帮助!我也尝试过
按照建议
awk '/-/{base=$0; next} {print base, $0}' /tmp/test4 | column -t
,但是如果连续的行以数字开头,则会删除以数字开头的第一行.
更新
这个sed咒语解决了我的问题: sed -r':a; N;/^ [0-9]. \ n [0-9]/{P; D} ;: b; s/^(.)\ n( .)/\ 1 \ 2 \ n \ 1/; P; s/. \ n//; $ d; N;/\ n [0-9]/D; bb'/tmp/test2
另一个问题:如果我在输出行中有超过8列,是否可以修改sed命令,以便将第9、10和11列移至新行并在其之前复制前5列?/p>
假设我有以下3行:
42 2 -------------- No ccc xxx zyz 42 2 -------------- No ccc a b c d e f 43 3 -------------- No ddd xy zz
我想得到:
42 2 -------------- No ccc xxx zyz 42 2 -------------- No ccc a b c 42 2 -------------- No ccc d e f 43 3 -------------- No ddd xy zz
下面的Perl脚本假定满足以下要求.
输入包含以数字或非数字开头的交替行块,其中每个数字行块后跟一个文本行块. 已更新:对于输出,需要将最后一个数字行中其块的前五列添加到紧随其后的文本块中的每行文本之前.其他文本行将按原样打印.
该代码在其缓冲区中收集数字行和文本行.一旦我们到达下一个数字行块的第一行,即两个缓冲区均为非空时,它们便被处理并清空.
use warnings;
use strict;
use feature 'say';
my $file = shift @ARGV || 'default_filename.txt';
die "Usage: $0 file\n" if not $file;
open my $fh, '<', $file or die "Can't open $file: $!";
my (@text, @nums);
while (my $line = <$fh>) {
chomp $line;
if ($line =~ /^[^0-9]/) {
push @text, $line;
if (eof) {
process_buffers(\@nums, \@text);
last
}
next;
}
elsif (@nums and @text) {
process_buffers(\@nums, \@text);
}
push @nums, $line;
}
sub process_buffers {
my ($rnums, $rtext) = @_;
# Remove last number line from array and take its first five columns
my @last_num_line_cols = (split ' ', pop @$rnums)[0..4];
# Print other number lines; all consecutive spaces replaced by tabs
say for map { s/\s+/\t/gr } @$rnums;
# Print text lines prepended by five columns of last number line
foreach my $text_line (@$rtext) {
say join "\t", @last_num_line_cols, $text_line;
}
@$rtext = ();
@$rnums = ();
}
需要使用上面涉及 eof 的条件来处理最后一批数字和文本块,因为没有其他测试可以在最后一行进行.它的位置假定最后一行必须是文本行,这是根据我对需求的假设得出的.
此打印
37 7 -------------- No aaa 40 0 -------------- No bbb xxx zzy 40 0 -------------- No bbb aa bb cc 42 2 -------------- No ccc xxx zyz 42 2 -------------- No ccc a b c d 43 3 -------------- No ddd xy zz 43 3 -------------- No ddd a a 43 3 -------------- No ddd a a 43 3 -------------- No ddd c 52 5 -------------- No eee yyyx zzz
(在标签上对齐,如输入所期望和输出所希望的那样)
更新如问题更新所述,将输出宽度限制为8列
使用此修改后的处理功能版本
sub process_buffers_fmt {
my ($rnums, $rtext) = @_;
my @last_num_line_cols = (split ' ', pop @$rnums)[0..4];
say for map { s/\s+/\t/gr } @$rnums;
# Format output lines to 8 columns at most
foreach my $text_line (@$rtext) {
my @text_cols = split ' ', $text_line;
while (my @prn_text_cols = splice @text_cols, 0, 3) {
say join "\t", @last_num_line_cols, @prn_text_cols;
}
}
@$rtext = ();
@$rnums = ();
}
这使用 splice 一次删除文本输出的前三列并在最后一行数字的(五)列中进行打印.这是在while
循环中完成的,因此一旦@text_cols
被全部处理(打印),它便会停止.
为了测试,我将以下内容添加到输入文件中43 3 ...
数字行之后的文本块中
a b c d e f g h i j k
并且主程序的输出获取了这些额外的行
43 3 -------------- No ddd a b c 43 3 -------------- No ddd d e f 43 3 -------------- No ddd g h i 43 3 -------------- No ddd j k
我用来测试所有要求和更新的输入文件是
37 7 -------------- No aaa MORE COLUMNS 40 0 -------------- No bbb xxx zzy aa bb cc 42 2 -------------- No ccc xxx zyz a b c d 43 3 -------------- No ddd AND YET MORE xy zz a a a a c a b c d e f g h i j k 52 5 -------------- No eee yyyx zzz
,程序的输出(带有process_buffers_fmt
功能)是
37 7 -------------- No aaa MORE COLUMNS 40 0 -------------- No bbb xxx zzy 40 0 -------------- No bbb aa bb cc 42 2 -------------- No ccc xxx zyz 42 2 -------------- No ccc a b c 42 2 -------------- No ccc d 43 3 -------------- No ddd xy zz 43 3 -------------- No ddd a a 43 3 -------------- No ddd a a 43 3 -------------- No ddd c 43 3 -------------- No ddd a b c 43 3 -------------- No ddd d e f 43 3 -------------- No ddd g h i 43 3 -------------- No ddd j k 52 5 -------------- No eee yyyx zzz
I have such text:
37 7 -------------- No aaa
40 0 -------------- No bbb
xxx zzy
aa bb cc
42 2 -------------- No ccc
xxx zyz
a b c d
43 3 -------------- No ddd
xy zz
a a
a a
c
52 5 -------------- No eee
yyyx zzz
When I process it with awk I get:
awk '{if($1+0==$1) p=$1 FS $2 FS $3 FS $4 FS $5; else $0=p FS $0}1' /tmp/test3 | column -t
37 7 -------------- No aaa
37 7 -------------- No aaa xxx zzz
40 0 -------------- No bbb
40 0 -------------- No bbb xxx zzy
40 0 -------------- No bbb aa bb cc
42 2 -------------- No ccc
42 2 -------------- No ccc xxx zyz
42 2 -------------- No ccc a b c d
43 3 -------------- No ddd
43 3 -------------- No ddd xy zz
43 3 -------------- No ddd a a
43 3 -------------- No ddd a a
43 3 -------------- No ddd c
52 5 -------------- No eee
52 5 -------------- No eee yyyx zzz
and I need to get following output:
37 7 -------------- No aaa
40 0 -------------- No bbb xxx zzy
40 0 -------------- No bbb aa bb cc
42 2 -------------- No ccc xxx zyz
42 2 -------------- No ccc a b c d
43 3 -------------- No ddd xy zz
43 3 -------------- No ddd a a
43 3 -------------- No ddd a a
43 3 -------------- No ddd c
52 5 -------------- No eee yyyx zzz
Thanks in advance for your help! I've also tried
awk '/-/{base=$0; next} {print base, $0}' /tmp/test4 | column -t
as suggested but it deletes the first line starting with a number if there's consecutive line starting with a number.
UPDATE
This sed spell solved my problem: sed -r ':a;N;/^[0-9].\n[0-9]/{P;D};:b;s/^(.)\n(.)/\1 \2\n\1/;P;s/.\n//;$d;N;/\n[0-9]/D;bb' /tmp/test2
One more question: if I have more than 8 columns in the output line is there a way to modify the sed command so it moves 9th, 10th and 11th column to a new line and copy the first 5 columns before it?
Let's say I have these 3 lines:
42 2 -------------- No ccc xxx zyz 42 2 -------------- No ccc a b c d e f 43 3 -------------- No ddd xy zz
and I'd like to get:
42 2 -------------- No ccc xxx zyz 42 2 -------------- No ccc a b c 42 2 -------------- No ccc d e f 43 3 -------------- No ddd xy zz
The Perl script below assumes the following requirements.
The input contains alternating blocks of lines starting with either a number or non-number, where each block of number-lines is followed by a block of text-lines. Updated: For the output the first five columns from the last number-line from its block need be prepended to each of the text-lines from the immediately following text-block. Other text-lines are printed as they are.
The code collects number and text lines in their buffers. They are processed and emptied once we get to the first line of the next number-lines block, which is when both buffers are non-empty.
use warnings;
use strict;
use feature 'say';
my $file = shift @ARGV || 'default_filename.txt';
die "Usage: $0 file\n" if not $file;
open my $fh, '<', $file or die "Can't open $file: $!";
my (@text, @nums);
while (my $line = <$fh>) {
chomp $line;
if ($line =~ /^[^0-9]/) {
push @text, $line;
if (eof) {
process_buffers(\@nums, \@text);
last
}
next;
}
elsif (@nums and @text) {
process_buffers(\@nums, \@text);
}
push @nums, $line;
}
sub process_buffers {
my ($rnums, $rtext) = @_;
# Remove last number line from array and take its first five columns
my @last_num_line_cols = (split ' ', pop @$rnums)[0..4];
# Print other number lines; all consecutive spaces replaced by tabs
say for map { s/\s+/\t/gr } @$rnums;
# Print text lines prepended by five columns of last number line
foreach my $text_line (@$rtext) {
say join "\t", @last_num_line_cols, $text_line;
}
@$rtext = ();
@$rnums = ();
}
The condition involving eof above is needed to process the last batch of number and text blocks, since no other test can work on the last line. Its placement assumes that the last line must be a text-line, what follows from my assumption of requirements.
This prints
37 7 -------------- No aaa 40 0 -------------- No bbb xxx zzy 40 0 -------------- No bbb aa bb cc 42 2 -------------- No ccc xxx zyz 42 2 -------------- No ccc a b c d 43 3 -------------- No ddd xy zz 43 3 -------------- No ddd a a 43 3 -------------- No ddd a a 43 3 -------------- No ddd c 52 5 -------------- No eee yyyx zzz
(aligned on tabs, as expected in input and wanted in output)
Update Limit output width to 8 columns, as described in the question update
Use this modified version of the processing function
sub process_buffers_fmt {
my ($rnums, $rtext) = @_;
my @last_num_line_cols = (split ' ', pop @$rnums)[0..4];
say for map { s/\s+/\t/gr } @$rnums;
# Format output lines to 8 columns at most
foreach my $text_line (@$rtext) {
my @text_cols = split ' ', $text_line;
while (my @prn_text_cols = splice @text_cols, 0, 3) {
say join "\t", @last_num_line_cols, @prn_text_cols;
}
}
@$rtext = ();
@$rnums = ();
}
This uses splice to remove the first three columns of text output at a time and print them with the (five) columns of the last number line. This is done in a while
loop so it stops once @text_cols
is all processed (printed).
To test I add to the text block after the 43 3 ...
number line in the input file the following
a b c d e f g h i j k
and the output of the main program acquires these extra lines
43 3 -------------- No ddd a b c 43 3 -------------- No ddd d e f 43 3 -------------- No ddd g h i 43 3 -------------- No ddd j k
The input file that I use to test all requirements and updates is
37 7 -------------- No aaa MORE COLUMNS 40 0 -------------- No bbb xxx zzy aa bb cc 42 2 -------------- No ccc xxx zyz a b c d 43 3 -------------- No ddd AND YET MORE xy zz a a a a c a b c d e f g h i j k 52 5 -------------- No eee yyyx zzz
and the output of the program (with process_buffers_fmt
function) is
37 7 -------------- No aaa MORE COLUMNS 40 0 -------------- No bbb xxx zzy 40 0 -------------- No bbb aa bb cc 42 2 -------------- No ccc xxx zyz 42 2 -------------- No ccc a b c 42 2 -------------- No ccc d 43 3 -------------- No ddd xy zz 43 3 -------------- No ddd a a 43 3 -------------- No ddd a a 43 3 -------------- No ddd c 43 3 -------------- No ddd a b c 43 3 -------------- No ddd d e f 43 3 -------------- No ddd g h i 43 3 -------------- No ddd j k 52 5 -------------- No eee yyyx zzz
这篇关于查找正则表达式,将下一行移至该行的末尾,然后将前5列复制到以字母开头的下一行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!