使用awk解析 [英] parsing using awk

查看:93
本文介绍了使用awk解析的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何使用awk根据来自另一个文件的数据来解析文件.

how to parse a file based on data from another file using awk.

我做了一个剧本:

BEGIN{ FS="\t" ; OFS="\t"

while((getline<"headfpkm")>0) {
        ++a
        id[a]=$1
        fpkm[a]=$2
        print id[a],fpkm[a]
        }
lastid=id[a]
print lastid
close("headfpkm")
}

/$lastid/{
        print $2,$3,$5,$7,$8,$14,fpkm[a]
        a--
        lastid=id[a]
}
END{ print "total lines=",FNR,"\n\nfile 1 index: ",a}

当我运行它时:

/$ awk -f testawk.awk file2

它可以正确运行BEGIN部分,但不提供任何输出.

it runs the BEGIN section properly but doesnt give any output.

NM_000014       5.04503
NM_000015       0.586677
NM_000016       1.138332278
NM_000017       0.64386
NM_000018       3.61746
NM_000019       2.8793
NM_000020       10.846
NM_000021       0.685098
NM_000022       46388.6
NM_000026       0.257471
NM_000026
total lines=    10

file 1 index:   10

搜索部分有什么问题吗?

Is anything wrong with the searching section??

文件2看起来像这样:

34      ACADM   NM_000016       9606    hsa-miR-3148    3       80      87      0.003   -0.016  -0.094  0.082   0.112   -0.160  97
34      ACADM   NM_000016       9606    hsa-miR-3163    1       623     629     0.001   -0.022  -0.020  0.065   0.125   -0.01   57
35      ACADS   NM_000017       9606    hsa-miR-3921    3       68      75      0.013   0.192   -0.097  0.031   -0.039  -0.147  82
35      ACADS   NM_000017       9606    hsa-miR-4303    2       67      73      0.012   0.150   -0.052  0.013   -0.039  -0.036  31
35      ACADS   NM_000017       9606    hsa-miR-4653-5p 3       68      75      0.003   0.192   -0.097  0.031   -0.039  -0.157  84
37      ACADVL  NM_000018       9606    hsa-miR-124     2       31      37      0.003   0.023   -0.057  0.012   -0.032  -0.171  76
37      ACADVL  NM_000018       9606    hsa-miR-1827    2       135     141     -0.007  -0.043  -0.058  0.039   -0.069  -0.258  91
37      ACADVL  NM_000018       9606    hsa-miR-2682    2       134     140     0.003   -0.014  -0.058  0.004   -0.047  -0.232  87
37      ACADVL  NM_000018       9606    hsa-miR-449c    2       134     140     -0.035  -0.014  -0.058  0.004   -0.047  -0.270  92
37      ACADVL  NM_000018       9606    hsa-miR-506     2       31      37      -0.016  0.023   -0.057  0.012   -0.032  -0.190  80

推荐答案

这有点猜测,因为我不确定您要实现的目标是100%.解决您的问题的更好方法是执行以下操作:

This is going to be a bit of guess, because I'm not 100% sure as to what you're trying to accomplish. The better way to solve your problem, would be to do something like this:

BEGIN {
    FS=OFS="\t"
}

FNR==NR {
    c++

    a[$1]=$2
    next
}

$3 in a {
    print $2,$3,$5,$7,$8,$14,a[$3]
}

END {
    printf "total lines=%s\n\nfile 1 index: %s\n", FNR, c
}

运行方式:

awk -f script.awk headfpkm file2

结果:

ACADM   NM_000016  hsa-miR-3148     80   87   -0.160  1.138332278
ACADM   NM_000016  hsa-miR-3163     623  629  -0.01   1.138332278
ACADS   NM_000017  hsa-miR-3921     68   75   -0.147  0.64386
ACADS   NM_000017  hsa-miR-4303     67   73   -0.036  0.64386
ACADS   NM_000017  hsa-miR-4653-5p  68   75   -0.157  0.64386
ACADVL  NM_000018  hsa-miR-124      31   37   -0.171  3.61746
ACADVL  NM_000018  hsa-miR-1827     135  141  -0.258  3.61746
ACADVL  NM_000018  hsa-miR-2682     134  140  -0.232  3.61746
ACADVL  NM_000018  hsa-miR-449c     134  140  -0.270  3.61746
ACADVL  NM_000018  hsa-miR-506      31   37   -0.190  3.61746
total lines=10

file 1 index: 10

这篇关于使用awk解析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆