如何使用if/else awk评估文件并提取此信息? [英] How to use if/else awk to evaluate a file and extract this information?

查看:71
本文介绍了如何使用if/else awk评估文件并提取此信息?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个像这样的文件:

I have a file like this:

419 I     0.3529
420 S     0.3182
421 T     0.3740
422 Y     0.3872
423 I     0.3460
424 E     0.4409
425 S     0.3182
426 T     0.3740
427 Y     0.4141
428 I     0.3460
429 S     0.3131
430 Y     0.3838
431 T     0.3939
432 S     0.3101

并且我正在尝试制作一个Awk程序来评估第三列中大于或等于0.4的数字.如果为true,则在该字母(第二列)中向上和向下四个字符.如果有多个匹配项,则每个匹配项都需要一个固定长度的字符串:

and I am trying to make an Awk program to evaluate the third column for numbers greater than or equal to 0.4. If true, take 4 characters up and 4 down in that letter (second column). If there are multiple matches, I want one fixed-length string for each:

STYIESTYI
IESTYISYT

第一个出现是因为在编号为424的行中有一个匹配项;第二个是编号为427的行的匹配项(部分重叠).我将如何处理呢?

The first one comes because there is a match on the line numbered 424; the second is a (partially overlapping) match for the line numbered 427. How would I approach this?

推荐答案

$ cat tst.awk
BEGIN {
    tgt = (tgt=="" ? 0.4 : tgt)
    cxt = (cxt=="" ?  4  : cxt)
    bef = (bef=="" ? cxt : bef)
    aft = (aft=="" ? cxt : aft)
}
$3 >= tgt { hits[++numHits] = NR }
{ chars[NR] = $2 }
END {
    for (hitNr=1; hitNr<=numHits; hitNr++) {
        for (lineNr=(hits[hitNr]-bef); lineNr<=(hits[hitNr]+aft); lineNr++) {
            printf "%s", (lineNr in chars ? chars[lineNr] : "")
        }
        print ""
    }
}

$ awk -f tst.awk file
STYIESTYI
IESTYISYT

请注意,如果第三个字段> = 0.4的行比文件的开头和/或结尾短于4行,这将明智地执行操作-请确保使用任何可能的答案测试这些条件,因为它们通常在下雨天提供潜在解决方案的人们经常忘记解决这类问题的日常案例.

Note that this will behave sensibly if the line with the 3rd field >= 0.4 is closer than 4 lines to the start and/or end of the file - make sure to test those conditions with any potential answer as they are common rainy day cases for this type of problem that people providing potential solutions often forget to cover.

例如,使用此输入文件尝试所有可能的解决方案,并查看是否获得期望的输出:

For example, try all potential solutions with this input file and see if you get the output you expect:

$ cat file1
421 T     0.3740
422 Y     0.3872
423 I     0.3460
424 E     0.4409
425 S     0.3182
426 T     0.3740
427 Y     0.4141
428 I     0.3460
429 S     0.3131
430 Y     0.3838

$ awk -f tst.awk file1
TYIESTYI
IESTYISY

或者如果您缺少输出行或带有前导/尾随空格或其他不想要的字符或其他内容的行.

or if you get missing output lines or lines with leading/trailing blanks or other undesirable chars or something else.

还请注意,您可以将目标值从0.4更改为其他值,并且可以通过设置命令行args来更改要在匹配行之前和/或之后打印的数字上下文行,例如

Note also that you can change the target value from 0.4 to something else, and you can change the number context lines to print before and/or after the the matched line just by setting command line args, e.g.

要在0.37前后打印5行上下文:

To print 5 lines of context before and after 0.37:

$ awk -v tgt=0.37 -v cxt=5 -f tst.awk file
ISTYIEST
ISTYIESTY
ISTYIESTYIS
TYIESTYISYT
YIESTYISYTS
STYISYTS
TYISYTS

要在0.34之前打印1行,然后在0.34之后打印2行:

To print 1 line before and 2 lines after 0.34:

$ awk -v tgt=0.34 -v bef=1 -v aft=2 -f tst.awk file
IST
STYI
TYIE
YIES
IEST
STYI
TYIS
YISY
SYTS
YTS

这篇关于如何使用if/else awk评估文件并提取此信息?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆