使用两个条件提取线 [英] Extracting lines using two criteria

查看:68
本文介绍了使用两个条件提取线的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

希望有人可以教我如何完成这项任务.

Hoping somebody can teach me how to do this task.

我认为awk可能会做得很好,但是我确实是初学者.

I am thinking awk might be good to do this, but I am really beginner.

我有一个如下文件(制表符分隔,实际文件更大). 在这里,重要的列是第二和第九(文件第一行中的235和15).

I have a file like below (tab separated, actual file is much bigger). Here, important columns are second and ninth (235 and 15 in the first line of the file).

S   235 1365    *   0   *   *   *   15  1   c81 592
H   235 296 99.7    +   0   0   3I296M1066I 14  1   s15018  1
H   235 719 95.4    +   0   0   174D545M820I    15  1   c2664   10
H   235 764 99.1    +   0   0   55I764M546I 15  1   c6519   4
H   235 792 100 +   0   0   180I792M393I    14  1   c407    107
S   236 1365    *   0   *   *   *   15  1   c474    152
H   236 279 95  +   0   0   765I279M321I    10-1    1   s7689   1
H   236 301 99.7    -   0   0   908I301M156I    15  1   s8443   1
H   236 563 95.2    -   0   0   728I563M74I 17  1   c1725   12
H   236 97  97.9    -   0   0   732I97M536I 17  1   s11472  1

我想通过指定第九列的值来提取行.此时,第二列将类似于枢轴列.我指的是透视列,如果第二列具有相同的值,则将其视为单个数据集.在这组行中,所有行都需要在第九列中具有特定的值.

I would like to extract lines by specifying the value of ninth columns. At this time, second columns will be like pivot column. What I mean pivot column is, consider as a single set of data if second column has same value. And within the set of lines, all lines need to have the specific values in the ninth column.

例如,如果我指定第九列"14"和"15".然后放出将.

So, for example, if I specify ninth column "14" and "15". Then out put will be.

S   235 1365    *   0   *   *   *   15  1   c81 592
H   235 296 99.7    +   0   0   3I296M1066I 14  1   s15018  1
H   235 719 95.4    +   0   0   174D545M820I    15  1   c2664   10
H   235 764 99.1    +   0   0   55I764M546I 15  1   c6519   4
H   235 792 100 +   0   0   180I792M393I    14  1   c407    107

第6行和第8行的第九列具有"15",但是集合"(由第二列指定:236)中的其他行具有"14"或"15"以外的值,因此我不想提取线.

6th and 8th lines have "15" in their ninth column, but other lines in the "set" (specified by second column: 236) have values other than "14" or "15", so I do not want to extract the lines.

推荐答案

$ cat tst.awk
$2 != prevPivot { prtCurrSet() }
$9 !~ /^1[45]$/ { isBadSet=1 }
{ currSet = currSet $0 ORS; prevPivot = $2 }
END { prtCurrSet() }
function prtCurrSet() {
    if ( !isBadSet ) {
        printf "%s", currSet
    }
    currSet = ""
    isBadSet = 0
}

$ awk -f tst.awk file
S   235 1365    *   0   *   *   *   15  1   c81 592
H   235 296 99.7    +   0   0   3I296M1066I 14  1   s15018  1
H   235 719 95.4    +   0   0   174D545M820I    15  1   c2664   10
H   235 764 99.1    +   0   0   55I764M546I 15  1   c6519   4
H   235 792 100 +   0   0   180I792M393I    14  1   c407    107

这篇关于使用两个条件提取线的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆