awk将多行存储到变量中 [英] awk multiple lines stored into variable

查看:185
本文介绍了awk将多行存储到变量中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

晚安,

我有一个以下格式的文件:

I have a file in the following format :

XXXXXXXXXXXYYYYYYYYAAAAAAAA
XXXXXXXXXXXIIIIIIII22222222
XXXXXXXXXXXOOOOOOOOPPPPPPPP
XXXXXXXXXXXAAAAAAAAKKKKKKKK
YYYYYYYYYYY22222222AAAAAAAA
YYYYYYYYYYY55555555BBBBBBBB
YYYYYYYYYYYGGGGGGGGKKKKKKKK
YYYYYYYYYYYQQQQQQQQ88888888

...等等.每4行第一部分(X,Y,...)保持不变,其余行更改.行之间没有分隔符,并且文件很大.

... and so on. Every 4 lines the first part (X, Y, ...) remains the same, the rest of the line changes. There is no separator between the lines, and the file is quite big.

我想找到一种使用awk一次读取4行,将它们存储在4个变量中和/或将RS设置为\ n并将FS设置为某种方式的方法,因为我想进行特定的比较4行块.并且能够在一次比赛中输出全部4行

I would like to find a way to use awk to read 4 lines at a time, store them in 4 variables and/or set the RS to \n and the FS to something, because i would like to do comparisons in specific 4line-blocks.And be able to output all 4 lines on a match

即,如果substr(17,3) == X输出您已读取的所有4条记录.

i.e, If substr(17,3) == X output all 4 records you read.

我很抱歉不提供代码,但是我真的不知道如何使用awk做到这一点.

My apologies for not supplying code, but I really have no idea how to do this with awk.

给定一个特定的数字,即Y = 17,脚本将查找每个记录的给定子字符串.例如:

Given a specific number, ie Y=17, the script would be looking that to a given substring of each record. For example :

if (subst(11:2) == 17) then    # This can be a match on any line of a 4 grouping ( ie X... ) 
print (all 4 lines - All X...) - or print a given substring of those lines.

提供示例的实际示例

if (substr($0,21,2) == "PP") { print all 4 lines in memory }

...and it would print :

XXXXXXXXXXXYYYYYYYYAAAAAAAA
XXXXXXXXXXXIIIIIIII22222222
XXXXXXXXXXXOOOOOOOOPPPPPPPP
XXXXXXXXXXXAAAAAAAAKKKKKKKK

推荐答案

以下简单的脚本至少应该作为一个开始很有用.

The following simple script should hopefully be useful at least as a start.

awk 'substr($0,21,2) == "PP" { p=1 } # remember match
    NR % 4 { a[NR%4] = $0; next }  # collect lines a[1] through a[3]
    # We have read four lines, and are ready to print if there was a match
    p { for (i=1; i<4; ++i) print a[i]; print $0;
        # reset for next iteration
        p=0 }' filename

在所有输入线上测试第一个条件.如果它们中的任何一个都匹配,我们通过将标志变量p设置为1来记住这一点(实际上,任何非零值都可以).条件也可以是正则表达式; /^.{20}PP/在第21位寻找"PP".

The first condition is tested on all input lines. If there is a match on any of them, we remember this by setting the flag variable p to 1 (anything non-zero will do, really). The condition could be a regex just as well; /^.{20}PP/ looks for "PP" in the 21st position.

第二个条件在不是4的倍数的行上触发.我们仅收集这些行,并(通过next语句)跳过脚本的其余部分. (您可能知道,%模运算符从除法计算余数;因此它从1到3,然后循环0、1、2,...)

The second condition fires on lines which are not multiples of 4. We simply collect these lines, and (by way of the next statement) skip the remainder of the script. (As you probably know, the % modulo operator calculates the remainder from division; so it goes from 1 to 3 and then cycles 0, 1, 2, ...)

因此,如果我们遇到第三个条件,则意味着我们处在行号可被4整除的行上;现在,条件检查p的值,如果它不是零,则采取措施.

Thus, if we fall through to the third condition, it means we are on a line whose line number is divisible by 4; now, the condition examines the value of p, and if it's nonzero, the action is taken.

(如果为零,我们将不打印任何内容而陷入困境,并且循环以NR%4等于1重新开始.)

(If it's zero, we fall through without printing anything, and the cycle starts over with NR%4 equal to 1.)

这篇关于awk将多行存储到变量中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆