为awk列说明符传递bash变量 [英] passing bash variable for awk column specifier

查看：89 发布时间：2020/9/15 6:31:27 bash shell variables awk

本文介绍了为awk列说明符传递bash变量的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

有很多关于将shell变量传递给awk的线程，我已经很容易地弄明白了，但是我要传递的变量是列说明符变量($1,$2等)

There are loads of threads about passing a shell variable to awk, and I've figured that out easily enough, but the variable I want to pass is the column specifier variable ($1,$2 etc)

考虑到shell也将这些变量用作默认的命令行参数变量，这令人困惑.

Given that the shell uses these variables as default command line argument variables as well, this is getting confusing.

在此脚本中，我只是将2个文件排序并连接在一起，但是为了开始泛化该脚本，我希望能够在命令行上指定awk应该在密钥文件中的字段作为其排序说明符.

In this script I'm just sorting and joining 2 files together, but in order to begin generalising the script a little, I want to be able to specify on the command line, the field in the key file that awk should be taking as its sort-specifier.

我在这里做错了什么? (我只是刚开始接触awk，而oneliner从

What am I doing wrong here? (I'm only just getting to grips with awk and the oneliner was adapted slightly from here.

keyfile="$1"
filetosort="$2"
field="$3"

awk -v a="$field"
paste "$keyfile" <(awk 'NR==FNR{o[FNR]=a; next} {t[$1]=$0} END{for(x=1; x<=FNR; x++){y=o[x]; print t[y]}}' $keyfile $filetosort)

编辑在/输出中添加了示例

EDIT Added example in/output

密钥文件:(来自文件的10条随机行)

PVClumt18   PAK_2199    PAK_01997
PVClopt2    PAK_2091    PAK_01895
PVCcif7     PAK_1975    PAK_01793
PVClopT12   PAU_02101   PAU_02063
PVCpnf20    PAK_3524    PAK_03184
PVClopt3    PAK_2090    PAK_01894
PVClopT11   PAU_02102   PAU_02064
PVCunit2_11 plu1698     PLT_01726
PVClumT9    afp10       PAU_02198
PVCunit2_17 plu1692     PLT_01720

要排序的文件:

PAU_02064   1pqx    1pqx_A  37.4    13  0.00035 31.4    >1pqx_A Conserved hypothetical protein; ZR18,structure, autostructure,spins,autoassign, northeast structural genomics consortium; NMR {Staphylococcus aureus subsp} SCOP: d.267.1.1 PDB: 2ffm_A 2m6q_A 2m8w_A
PAK_01997   5ftj    5ftj_A  99.9    1.6e-26 4.2e-31 229.2   >5ftj_A Transitional endoplasmic reticulum ATPase; hydrolase, single-particle, AAA ATPase; HET: ADP OJA; 2.30A {Homo sapiens} PDB: 3cf1_A* 3cf3_A* 3cf2_A* 5ftk_A* 5ftl_A* 5ftm_A* 5ftn_A* 1r7r_A* 5c19_A 5c1b_A* 5c18_A* 3cf0_A*
PAK_01894   3j9q    3j9q_A  99.9    1.8e-29 4.6e-34 215.9   >3j9q_A Sheath; pyocin, bacteriocin, sheath, structural protein; 3.50A {Pseudomonas aeruginosa}
PAK_03184   1xju    1xju_A  99.4    4.1e-17 1.1e-21 98.8    >1xju_A Lysozyme; secreted inactive conformation, hydrolase; 1.07A {Enterobacteria phage P1} SCOP: d.2.1.3
PAK_01793   5a3a    5a3a_A  50.8    6   0.00016 31.4    >5a3a_A SIR2 family protein; transferase, P-ribosyltransferase, metalloprotein, NAD-depen lipoylation, regulatory enzyme, rossmann fold; 1.54A {Streptococcus pyogenes} PDB: 5a3b_A* 5a3c_A*
PLT_01720   3ggm    3ggm_A  54.2    4.9 0.00013 26.2    >3ggm_A Uncharacterized protein BT9727_2919; bacillus cereus group., structural genomics, PSI-2, protein structure initiative; 2.00A {Bacillus thuringiensis serovarkonkukian}
PLT_01726   3h2t    3h2t_A  96.8    8e-06   2.1e-10 82.6    >3h2t_A Baseplate structural protein GP6; viral protein, virion; 3.20A {Enterobacteria phage T4} PDB: 3h3w_A 3h3y_A
PAK_01895   3j9q    3j9q_A  100.0   2.5e-35 6.4e-40 248.6   >3j9q_A Sheath; pyocin, bacteriocin, sheath, structural protein; 3.50A {Pseudomonas aeruginosa}
PAU_02198   4jiv    4jiv_D  69.6    1.6 4.2e-05 27.5    >4jiv_D VCA0105, putative uncharacterized protein; PAAR-repeat motif, membrane piercing, type VI secretion SYST vibrio cholerae VGRG2; HET: PLM STE ELA; 1.90A {Vibrio cholerae o1 biovar eltor}
PAU_02063   4yap    4yap_A  31.1    20  0.00052 29.1    >4yap_A Glutathione S-transferase homolog; GSH-lyase GSH-dependent; 1.11A {Sphingobium SP} PDB: 4g10_A 4yav_A*

因此，我需要根据密钥文件中的第3列和文件中的第1列对行进行排序和匹配.

Thus I need to sort and match the rows based on column 3 in the keyfile, and column 1 in the file to sort.

以及生成的文件:(第3列和第4列的重复是我打算在之后进行整理的内容)

And the resulting file: (The duplication of columns 3 & 4 was something I was planning to sort out after)

PVClumt18   PAK_2199    PAK_01997   PAK_01997   5ftj    5ftj_A  99.9    1.6e-26 4.2e-31 229.2   >5ftj_A Transitional endoplasmic reticulum ATPase; hydrolase, single-particle, AAA ATPase; HET: ADP OJA; 2.30A {Homo sapiens} PDB: 3cf1_A* 3cf3_A* 3cf2_A* 5ftk_A* 5ftl_A* 5ftm_A* 5ftn_A* 1r7r_A* 5c19_A 5c1b_A* 5c18_A* 3cf0_A*
PVClopt2    PAK_2091    PAK_01895   PAK_01895   3j9q    3j9q_A  100.0   2.5e-35 6.4e-40 248.6   >3j9q_A Sheath; pyocin, bacteriocin, sheath, structural protein; 3.50A {Pseudomonas aeruginosa}
PVCcif7 PAK_1975    PAK_01793   PAK_01793   5a3a    5a3a_A  50.8    6   0.00016 31.4    >5a3a_A SIR2 family protein; transferase, P-ribosyltransferase, metalloprotein, NAD-depen lipoylation, regulatory enzyme, rossmann fold; 1.54A {Streptococcus pyogenes} PDB: 5a3b_A* 5a3c_A*
PVClopT12   PAU_02101   PAU_02063   PAU_02063   4yap    4yap_A  31.1    20  0.00052 29.1    >4yap_A Glutathione S-transferase homolog; GSH-lyase GSH-dependent; 1.11A {Sphingobium SP} PDB: 4g10_A 4yav_A*
PVCpnf20    PAK_3524    PAK_03184   PAK_03184   1xju    1xju_A  99.4    4.1e-17 1.1e-21 98.8    >1xju_A Lysozyme; secreted inactive conformation, hydrolase; 1.07A {Enterobacteria phage P1} SCOP: d.2.1.3
PVClopt3    PAK_2090    PAK_01894   PAK_01894   3j9q    3j9q_A  99.9    1.8e-29 4.6e-34 215.9   >3j9q_A Sheath; pyocin, bacteriocin, sheath, structural protein; 3.50A {Pseudomonas aeruginosa}
PVClopT11   PAU_02102   PAU_02064   PAU_02064   1pqx    1pqx_A  37.4    13  0.00035 31.4    >1pqx_A Conserved hypothetical protein; ZR18,structure, autostructure,spins,autoassign, northeast structural genomics consortium; NMR {Staphylococcus aureus subsp} SCOP: d.267.1.1 PDB: 2ffm_A 2m6q_A 2m8w_A
PVCunit2_11 plu1698 PLT_01726   PLT_01726   3h2t    3h2t_A  96.8    8e-06   2.1e-10 82.6    >3h2t_A Baseplate structural protein GP6; viral protein, virion; 3.20A {Enterobacteria phage T4} PDB: 3h3w_A 3h3y_A
PVClumT9    afp10   PAU_02198   PAU_02198   4jiv    4jiv_D  69.6    1.6 4.2e-05 27.5    >4jiv_D VCA0105, putative uncharacterized protein; PAAR-repeat motif, membrane piercing, type VI secretion SYST vibrio cholerae VGRG2; HET: PLM STE ELA; 1.90A {Vibrio cholerae o1 biovar eltor}
PVCunit2_17 plu1692 PLT_01720   PLT_01720   3ggm    3ggm_A  54.2    4.9 0.00013 26.2    >3ggm_A Uncharacterized protein BT9727_2919; bacillus cereus group., structural genomics, PSI-2, protein structure initiative; 2.00A {Bacillus thuringiensis serovarkonkukian}

推荐答案

当您传递awk -v a="$field"时，awk变量a的说明仅适用于该单个awk命令 .您不能期望a在awk的完全不同的调用中可用.

When you pass awk -v a="$field", the specification of the awk variable a is only good for that single awk command. You can't expect a to be available in a completely different invocation of awk.

因此，您需要将其直接放置在中:

Thus, you need to put it in-place directly:

$ bashvar="2"
$ echo 'foo bar baz' | awk -v awkvar="$bashvar" '{print $awkvar}'
bar

或者您的情况:

field=1
awk -v a="$field" '
NR==FNR {
  o[FNR]=$a;
  next;
}

{ t[$1] = $0 }

END {
  for(x=1; x<=FNR; x++) {
    y=o[x]
    printf("%s\t%s\n", y, t[y])
  }
}' "$keyfile" "$filetosort"

要点:

Points of note:

我们的printf在这里同时发出键和值，因此不需要使用paste将keyfile值放回去.

$a用于将awk变量a(从shell变量field分配)视为变量名称本身，并执行间接引用-因此查找相关的列号.

总是，总是在扩展时引用您的shell变量.否则，您将无法知道$keyfile的扩展将为awk生成多少个参数-可能为0(如果在IFS中找不到字符串中没有字符的话)；否则，可能为0.可以是1，但也可以是一个完全不受限制的数字(input file.txt将成为两个参数，分别是input和file.txt； * input * .txt将每个*替换为文件列表).

Our printf here is emitting both the key and the value, so there's no need to use paste to put the keyfile values back in.

$a is used to treat the awk variable a (assigned from shell variable field) as a variable name itself, and to perform an indirect reference -- thus, looking up the relevant column number.

Always, always quote your shell variables on expansion. Otherwise, you have no way of knowing how many argument to awk will be generated by the expansion of $keyfile -- it could be 0 (if there are no characters in the string not found in IFS); it could be 1, but it could also be a completely unbounded number (input file.txt would become two arguments, input and file.txt; * input * .txt would have each * replaced with a list of files).

这篇关于为awk列说明符传递bash变量的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

为awk列说明符传递bash变量 [英] passing bash variable for awk column specifier

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

为awk列说明符传递bash变量 [英] passing bash variable for awk column specifier

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭