awk从文件1中顺序查找丢失的数字,并将其追加到文件2中的列 [英] awk find missing number in sequence from file1 and append to column in file2
问题描述
cat文件1
A R5 A48 1B R5 A48 2C R4 A48 3D R8 A48 15E R9 A48 22F R20 B55 21G R55 B22 19R B1 I77 14AA B8 PP 18BX A255 PA 7CA A77 PB 10WW W7 PX 11
我在此awk班轮返航中找到了部分解决方法
arr =($(awk'{print $ 4}'file1))|printf'%s \ n'$ {arr [*]} |\awk -v first = 1 -v last = 23'BEGIN {for(i = first; i< = last; i ++)array [i] = 1} \{for(i = 1; i <= NF; i ++)array [$ i] + = 1} END {for(for(array in number)if if(array [num] == 0)print num}'45689121316172023
这就是我想要的,但是我仍然缺少要打印的剩余数字,直到23至数字31,然后将其粘贴为基于文件2行数/行数的列$ 3(数字3)
cat file2
md5sum 25d422cc23b44c3bbd7a66c76d52af46md5sum 25d422cc23b44c3bbd7a66c76d52af47md5sum 25d422cc23b44c3bbd7a66c76d52af48md5sum 25d422cc23b44c3bbd7a66c76d52af41md5sum 25d422cc23b44c3bbd7a66c76d52af22md5sum 25d422cc23b44c3bbd7a66c76d52af33md5sum 25d422cc23b44c3bbd7a66c76d52af12md5sum 25d422cc23b44c3bbd7a66c76d52af01md5sum 25d422cc23b44c3bbd7a66c76d52af55md5sum 25d422cc23b44c3bbd7a66c76d52af14md5sum 25d422cc23b44c3bbd7a66c76d52af18md5sum 25d422cc23b44c3bbd7a66c76d52af17md5sum 25d422cc23b44c3bbd7a66c76d52af77md5sum 25d422cc23b44c3bbd7a66c76d52af06md5sum 25d422cc23b44c3bbd7a66c76d52af05md5sum 25d422cc23b44c3bbd7a66c76d52af72md5sum 25d422cc23b44c3bbd7a66c76d52af73md5sum 25d422cc23b44c3bbd7a66c76d52af74md5sum 25d422cc23b44c3bbd7a66c76d52af75md5sum 25d422cc23b44c3bbd7a66c76d52af76
导致
md5sum 25d422cc23b44c3bbd7a66c76d52af46 4md5sum 25d422cc23b44c3bbd7a66c76d52af47 5md5sum 25d422cc23b44c3bbd7a66c76d52af48 6md5sum 25d422cc23b44c3bbd7a66c76d52af41 8md5sum 25d422cc23b44c3bbd7a66c76d52af22 9md5sum 25d422cc23b44c3bbd7a66c76d52af33 12md5sum 25d422cc23b44c3bbd7a66c76d52af12 13md5sum 25d422cc23b44c3bbd7a66c76d52af01 16md5sum 25d422cc23b44c3bbd7a66c76d52af55 17md5sum 25d422cc23b44c3bbd7a66c76d52af14 19md5sum 25d422cc23b44c3bbd7a66c76d52af18 20md5sum 25d422cc23b44c3bbd7a66c76d52af17 23md5sum 25d422cc23b44c3bbd7a66c76d52af77 24md5sum 25d422cc23b44c3bbd7a66c76d52af06 25md5sum 25d422cc23b44c3bbd7a66c76d52af05 26md5sum 25d422cc23b44c3bbd7a66c76d52af72 27md5sum 25d422cc23b44c3bbd7a66c76d52af73 28md5sum 25d422cc23b44c3bbd7a66c76d52af74 29md5sum 25d422cc23b44c3bbd7a66c76d52af75 30md5sum 25d422cc23b44c3bbd7a66c76d52af76 31
例如如果下一个file2将具有22行/行,则将添加数字直到32,例如
我认为也应该通过更好的方法将file1列$ 4中的数字也放入数组并保持逻辑
awk
进行救援!无需在脚本中插入 bash
. awk
是一种成熟的编程语言,特别是用于文本处理.
$ awk'NR == FNR {a [$ NF];下一个} {while(a中的++ c);打印$ 0,c}'file1 file2md5sum 25d422cc23b44c3bbd7a66c76d52af46 4md5sum 25d422cc23b44c3bbd7a66c76d52af47 5md5sum 25d422cc23b44c3bbd7a66c76d52af48 6md5sum 25d422cc23b44c3bbd7a66c76d52af41 8md5sum 25d422cc23b44c3bbd7a66c76d52af22 9md5sum 25d422cc23b44c3bbd7a66c76d52af33 12md5sum 25d422cc23b44c3bbd7a66c76d52af12 13md5sum 25d422cc23b44c3bbd7a66c76d52af01 16md5sum 25d422cc23b44c3bbd7a66c76d52af55 17md5sum 25d422cc23b44c3bbd7a66c76d52af14 20md5sum 25d422cc23b44c3bbd7a66c76d52af18 23md5sum 25d422cc23b44c3bbd7a66c76d52af17 24md5sum 25d422cc23b44c3bbd7a66c76d52af77 25md5sum 25d422cc23b44c3bbd7a66c76d52af06 26md5sum 25d422cc23b44c3bbd7a66c76d52af05 27md5sum 25d422cc23b44c3bbd7a66c76d52af72 28md5sum 25d422cc23b44c3bbd7a66c76d52af73 29md5sum 25d422cc23b44c3bbd7a66c76d52af74 30md5sum 25d422cc23b44c3bbd7a66c76d52af75 31md5sum 25d422cc23b44c3bbd7a66c76d52af76 32
请注意,第一个文件中是 19
,因此在输出中将其跳过.您的输出与给定输入的规格不一致.
hi as suggested in previous question, i will try more clarify what i want to achieve. as in file1, in column $4 i have numbers which are not continuosly sequenced like 1,2,3,4,5.. , it means i need print those missing ones e.g. after number 3 i should get number 4 and so on
cat file1
A R5 A48 1
B R5 A48 2
C R4 A48 3
D R8 A48 15
E R9 A48 22
F R20 B55 21
G R55 B22 19
R B1 I77 14
AA B8 PP 18
BX A255 PA 7
CA A77 PB 10
WW W7 PX 11
i find out partly solution in this awk one liner returning
arr=($(awk '{ print $4 }' file1 )) | printf '%s\n' ${arr[*]}| \
awk -v first=1 -v last=23 ' BEGIN {for(i=first; i<=last; i++) array[i] = 1} \
{for(i=1;i<=NF;i++) array[$i] += 1} END {for (num in array) if (array[num] == 0) print num}'
4
5
6
8
9
12
13
16
17
20
23
this is what i want it, BUT i still missing to be printed remaining numbers after 23 till number 31 and have it pasted as column $3 (number 3) based on file2 number of rows/lines
cat file2
md5sum 25d422cc23b44c3bbd7a66c76d52af46
md5sum 25d422cc23b44c3bbd7a66c76d52af47
md5sum 25d422cc23b44c3bbd7a66c76d52af48
md5sum 25d422cc23b44c3bbd7a66c76d52af41
md5sum 25d422cc23b44c3bbd7a66c76d52af22
md5sum 25d422cc23b44c3bbd7a66c76d52af33
md5sum 25d422cc23b44c3bbd7a66c76d52af12
md5sum 25d422cc23b44c3bbd7a66c76d52af01
md5sum 25d422cc23b44c3bbd7a66c76d52af55
md5sum 25d422cc23b44c3bbd7a66c76d52af14
md5sum 25d422cc23b44c3bbd7a66c76d52af18
md5sum 25d422cc23b44c3bbd7a66c76d52af17
md5sum 25d422cc23b44c3bbd7a66c76d52af77
md5sum 25d422cc23b44c3bbd7a66c76d52af06
md5sum 25d422cc23b44c3bbd7a66c76d52af05
md5sum 25d422cc23b44c3bbd7a66c76d52af72
md5sum 25d422cc23b44c3bbd7a66c76d52af73
md5sum 25d422cc23b44c3bbd7a66c76d52af74
md5sum 25d422cc23b44c3bbd7a66c76d52af75
md5sum 25d422cc23b44c3bbd7a66c76d52af76
resulting into
md5sum 25d422cc23b44c3bbd7a66c76d52af46 4
md5sum 25d422cc23b44c3bbd7a66c76d52af47 5
md5sum 25d422cc23b44c3bbd7a66c76d52af48 6
md5sum 25d422cc23b44c3bbd7a66c76d52af41 8
md5sum 25d422cc23b44c3bbd7a66c76d52af22 9
md5sum 25d422cc23b44c3bbd7a66c76d52af33 12
md5sum 25d422cc23b44c3bbd7a66c76d52af12 13
md5sum 25d422cc23b44c3bbd7a66c76d52af01 16
md5sum 25d422cc23b44c3bbd7a66c76d52af55 17
md5sum 25d422cc23b44c3bbd7a66c76d52af14 19
md5sum 25d422cc23b44c3bbd7a66c76d52af18 20
md5sum 25d422cc23b44c3bbd7a66c76d52af17 23
md5sum 25d422cc23b44c3bbd7a66c76d52af77 24
md5sum 25d422cc23b44c3bbd7a66c76d52af06 25
md5sum 25d422cc23b44c3bbd7a66c76d52af05 26
md5sum 25d422cc23b44c3bbd7a66c76d52af72 27
md5sum 25d422cc23b44c3bbd7a66c76d52af73 28
md5sum 25d422cc23b44c3bbd7a66c76d52af74 29
md5sum 25d422cc23b44c3bbd7a66c76d52af75 30
md5sum 25d422cc23b44c3bbd7a66c76d52af76 31
e.g. if if next file2 will have 22 rows/lines it will add number till 32 for example
i believe it should be done by more better way as well with putting numbers from file1 column $4 into array too and remaing logic
awk
to the rescue! No need to insert bash
into the script. awk
is a fully fledged programming language especially for text processing.
$ awk 'NR==FNR{a[$NF]; next} {while(++c in a); print $0, c}' file1 file2
md5sum 25d422cc23b44c3bbd7a66c76d52af46 4
md5sum 25d422cc23b44c3bbd7a66c76d52af47 5
md5sum 25d422cc23b44c3bbd7a66c76d52af48 6
md5sum 25d422cc23b44c3bbd7a66c76d52af41 8
md5sum 25d422cc23b44c3bbd7a66c76d52af22 9
md5sum 25d422cc23b44c3bbd7a66c76d52af33 12
md5sum 25d422cc23b44c3bbd7a66c76d52af12 13
md5sum 25d422cc23b44c3bbd7a66c76d52af01 16
md5sum 25d422cc23b44c3bbd7a66c76d52af55 17
md5sum 25d422cc23b44c3bbd7a66c76d52af14 20
md5sum 25d422cc23b44c3bbd7a66c76d52af18 23
md5sum 25d422cc23b44c3bbd7a66c76d52af17 24
md5sum 25d422cc23b44c3bbd7a66c76d52af77 25
md5sum 25d422cc23b44c3bbd7a66c76d52af06 26
md5sum 25d422cc23b44c3bbd7a66c76d52af05 27
md5sum 25d422cc23b44c3bbd7a66c76d52af72 28
md5sum 25d422cc23b44c3bbd7a66c76d52af73 29
md5sum 25d422cc23b44c3bbd7a66c76d52af74 30
md5sum 25d422cc23b44c3bbd7a66c76d52af75 31
md5sum 25d422cc23b44c3bbd7a66c76d52af76 32
Note that 19
is in your first file so it's skipped in the output. Your output is not consistent with your spec for the given input.
这篇关于awk从文件1中顺序查找丢失的数字,并将其追加到文件2中的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!