awk从文件1中顺序查找丢失的数字,并将其追加到文件2中的列 [英] awk find missing number in sequence from file1 and append to column in file2

查看:57
本文介绍了awk从文件1中顺序查找丢失的数字,并将其追加到文件2中的列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如上一个问题所建议,我将尽力澄清我想要实现的目标.像在file1中一样,在$ 4列中,我的数字不是像1,2,3,4,5那样连续排列的.我需要打印那些丢失的东西,例如在3号之后,我应该得到4号,依此类推

cat文件1

  A R5 A48 1B R5 A48 2C R4 A48 3D R8 A48 15E R9 A48 22F R20 B55 21G R55 B22 19R B1 I77 14AA B8 PP 18BX A255 PA 7CA A77 PB 10WW W7 PX 11 

我在此awk班轮返航中找到了部分解决方法

  arr =($(awk'{print $ 4}'file1))|printf'%s \ n'$ {arr [*]} |\awk -v first = 1 -v last = 23'BEGIN {for(i = first; i< = last; i ++)array [i] = 1} \{for(i = 1; i <= NF; i ++)array [$ i] + = 1} END {for(for(array in number)if if(array [num] == 0)print num}'45689121316172023 

这就是我想要的,但是我仍然缺少要打印的剩余数字,直到23至数字31,然后将其粘贴为基于文件2行数/行数的列$ 3(数字3)

cat file2

  md5sum 25d422cc23b44c3bbd7a66c76d52af46md5sum 25d422cc23b44c3bbd7a66c76d52af47md5sum 25d422cc23b44c3bbd7a66c76d52af48md5sum 25d422cc23b44c3bbd7a66c76d52af41md5sum 25d422cc23b44c3bbd7a66c76d52af22md5sum 25d422cc23b44c3bbd7a66c76d52af33md5sum 25d422cc23b44c3bbd7a66c76d52af12md5sum 25d422cc23b44c3bbd7a66c76d52af01md5sum 25d422cc23b44c3bbd7a66c76d52af55md5sum 25d422cc23b44c3bbd7a66c76d52af14md5sum 25d422cc23b44c3bbd7a66c76d52af18md5sum 25d422cc23b44c3bbd7a66c76d52af17md5sum 25d422cc23b44c3bbd7a66c76d52af77md5sum 25d422cc23b44c3bbd7a66c76d52af06md5sum 25d422cc23b44c3bbd7a66c76d52af05md5sum 25d422cc23b44c3bbd7a66c76d52af72md5sum 25d422cc23b44c3bbd7a66c76d52af73md5sum 25d422cc23b44c3bbd7a66c76d52af74md5sum 25d422cc23b44c3bbd7a66c76d52af75md5sum 25d422cc23b44c3bbd7a66c76d52af76 

导致

  md5sum 25d422cc23b44c3bbd7a66c76d52af46 4md5sum 25d422cc23b44c3bbd7a66c76d52af47 5md5sum 25d422cc23b44c3bbd7a66c76d52af48 6md5sum 25d422cc23b44c3bbd7a66c76d52af41 8md5sum 25d422cc23b44c3bbd7a66c76d52af22 9md5sum 25d422cc23b44c3bbd7a66c76d52af33 12md5sum 25d422cc23b44c3bbd7a66c76d52af12 13md5sum 25d422cc23b44c3bbd7a66c76d52af01 16md5sum 25d422cc23b44c3bbd7a66c76d52af55 17md5sum 25d422cc23b44c3bbd7a66c76d52af14 19md5sum 25d422cc23b44c3bbd7a66c76d52af18 20md5sum 25d422cc23b44c3bbd7a66c76d52af17 23md5sum 25d422cc23b44c3bbd7a66c76d52af77 24md5sum 25d422cc23b44c3bbd7a66c76d52af06 25md5sum 25d422cc23b44c3bbd7a66c76d52af05 26md5sum 25d422cc23b44c3bbd7a66c76d52af72 27md5sum 25d422cc23b44c3bbd7a66c76d52af73 28md5sum 25d422cc23b44c3bbd7a66c76d52af74 29md5sum 25d422cc23b44c3bbd7a66c76d52af75 30md5sum 25d422cc23b44c3bbd7a66c76d52af76 31 

例如如果下一个file2将具有22行/行,则将添加数字直到32,例如

我认为也应该通过更好的方法将file1列$ 4中的数字也放入数组并保持逻辑

解决方案

awk 进行救援!无需在脚本中插入 bash . awk 是一种成熟的编程语言,特别是用于文本处理.

  $ awk'NR == FNR {a [$ NF];下一个} {while(a中的++ c);打印$ 0,c}'file1 file2md5sum 25d422cc23b44c3bbd7a66c76d52af46 4md5sum 25d422cc23b44c3bbd7a66c76d52af47 5md5sum 25d422cc23b44c3bbd7a66c76d52af48 6md5sum 25d422cc23b44c3bbd7a66c76d52af41 8md5sum 25d422cc23b44c3bbd7a66c76d52af22 9md5sum 25d422cc23b44c3bbd7a66c76d52af33 12md5sum 25d422cc23b44c3bbd7a66c76d52af12 13md5sum 25d422cc23b44c3bbd7a66c76d52af01 16md5sum 25d422cc23b44c3bbd7a66c76d52af55 17md5sum 25d422cc23b44c3bbd7a66c76d52af14 20md5sum 25d422cc23b44c3bbd7a66c76d52af18 23md5sum 25d422cc23b44c3bbd7a66c76d52af17 24md5sum 25d422cc23b44c3bbd7a66c76d52af77 25md5sum 25d422cc23b44c3bbd7a66c76d52af06 26md5sum 25d422cc23b44c3bbd7a66c76d52af05 27md5sum 25d422cc23b44c3bbd7a66c76d52af72 28md5sum 25d422cc23b44c3bbd7a66c76d52af73 29md5sum 25d422cc23b44c3bbd7a66c76d52af74 30md5sum 25d422cc23b44c3bbd7a66c76d52af75 31md5sum 25d422cc23b44c3bbd7a66c76d52af76 32 

请注意,第一个文件中是 19 ,因此在输出中将其跳过.您的输出与给定输入的规格不一致.

hi as suggested in previous question, i will try more clarify what i want to achieve. as in file1, in column $4 i have numbers which are not continuosly sequenced like 1,2,3,4,5.. , it means i need print those missing ones e.g. after number 3 i should get number 4 and so on

cat file1

A R5 A48 1
B R5 A48 2
C R4 A48 3
D R8 A48 15
E R9 A48 22
F R20 B55 21
G R55 B22 19
R B1 I77 14
AA B8 PP 18
BX A255 PA 7
CA A77 PB 10
WW W7 PX 11

i find out partly solution in this awk one liner returning

arr=($(awk '{ print $4 }' file1 )) | printf '%s\n' ${arr[*]}| \
awk -v first=1 -v last=23 ' BEGIN {for(i=first; i<=last; i++) array[i] = 1} \
{for(i=1;i<=NF;i++) array[$i] += 1} END {for (num in array) if (array[num] == 0) print num}'
4
5
6
8
9
12
13
16
17
20
23

this is what i want it, BUT i still missing to be printed remaining numbers after 23 till number 31 and have it pasted as column $3 (number 3) based on file2 number of rows/lines

cat file2

md5sum 25d422cc23b44c3bbd7a66c76d52af46 
md5sum 25d422cc23b44c3bbd7a66c76d52af47 
md5sum 25d422cc23b44c3bbd7a66c76d52af48 
md5sum 25d422cc23b44c3bbd7a66c76d52af41 
md5sum 25d422cc23b44c3bbd7a66c76d52af22 
md5sum 25d422cc23b44c3bbd7a66c76d52af33 
md5sum 25d422cc23b44c3bbd7a66c76d52af12 
md5sum 25d422cc23b44c3bbd7a66c76d52af01 
md5sum 25d422cc23b44c3bbd7a66c76d52af55 
md5sum 25d422cc23b44c3bbd7a66c76d52af14 
md5sum 25d422cc23b44c3bbd7a66c76d52af18 
md5sum 25d422cc23b44c3bbd7a66c76d52af17 
md5sum 25d422cc23b44c3bbd7a66c76d52af77 
md5sum 25d422cc23b44c3bbd7a66c76d52af06 
md5sum 25d422cc23b44c3bbd7a66c76d52af05 
md5sum 25d422cc23b44c3bbd7a66c76d52af72 
md5sum 25d422cc23b44c3bbd7a66c76d52af73 
md5sum 25d422cc23b44c3bbd7a66c76d52af74 
md5sum 25d422cc23b44c3bbd7a66c76d52af75 
md5sum 25d422cc23b44c3bbd7a66c76d52af76 

resulting into

md5sum 25d422cc23b44c3bbd7a66c76d52af46 4
md5sum 25d422cc23b44c3bbd7a66c76d52af47 5
md5sum 25d422cc23b44c3bbd7a66c76d52af48 6
md5sum 25d422cc23b44c3bbd7a66c76d52af41 8
md5sum 25d422cc23b44c3bbd7a66c76d52af22 9
md5sum 25d422cc23b44c3bbd7a66c76d52af33 12
md5sum 25d422cc23b44c3bbd7a66c76d52af12 13
md5sum 25d422cc23b44c3bbd7a66c76d52af01 16
md5sum 25d422cc23b44c3bbd7a66c76d52af55 17
md5sum 25d422cc23b44c3bbd7a66c76d52af14 19
md5sum 25d422cc23b44c3bbd7a66c76d52af18 20
md5sum 25d422cc23b44c3bbd7a66c76d52af17 23
md5sum 25d422cc23b44c3bbd7a66c76d52af77 24
md5sum 25d422cc23b44c3bbd7a66c76d52af06 25
md5sum 25d422cc23b44c3bbd7a66c76d52af05 26
md5sum 25d422cc23b44c3bbd7a66c76d52af72 27
md5sum 25d422cc23b44c3bbd7a66c76d52af73 28
md5sum 25d422cc23b44c3bbd7a66c76d52af74 29
md5sum 25d422cc23b44c3bbd7a66c76d52af75 30
md5sum 25d422cc23b44c3bbd7a66c76d52af76 31

e.g. if if next file2 will have 22 rows/lines it will add number till 32 for example

i believe it should be done by more better way as well with putting numbers from file1 column $4 into array too and remaing logic

解决方案

awk to the rescue! No need to insert bash into the script. awk is a fully fledged programming language especially for text processing.

$ awk 'NR==FNR{a[$NF]; next} {while(++c in a); print $0, c}' file1 file2

md5sum 25d422cc23b44c3bbd7a66c76d52af46  4
md5sum 25d422cc23b44c3bbd7a66c76d52af47  5
md5sum 25d422cc23b44c3bbd7a66c76d52af48  6
md5sum 25d422cc23b44c3bbd7a66c76d52af41  8
md5sum 25d422cc23b44c3bbd7a66c76d52af22  9
md5sum 25d422cc23b44c3bbd7a66c76d52af33  12
md5sum 25d422cc23b44c3bbd7a66c76d52af12  13
md5sum 25d422cc23b44c3bbd7a66c76d52af01  16
md5sum 25d422cc23b44c3bbd7a66c76d52af55  17
md5sum 25d422cc23b44c3bbd7a66c76d52af14  20
md5sum 25d422cc23b44c3bbd7a66c76d52af18  23
md5sum 25d422cc23b44c3bbd7a66c76d52af17  24
md5sum 25d422cc23b44c3bbd7a66c76d52af77  25
md5sum 25d422cc23b44c3bbd7a66c76d52af06  26
md5sum 25d422cc23b44c3bbd7a66c76d52af05  27
md5sum 25d422cc23b44c3bbd7a66c76d52af72  28
md5sum 25d422cc23b44c3bbd7a66c76d52af73  29
md5sum 25d422cc23b44c3bbd7a66c76d52af74  30
md5sum 25d422cc23b44c3bbd7a66c76d52af75  31
md5sum 25d422cc23b44c3bbd7a66c76d52af76  32

Note that 19 is in your first file so it's skipped in the output. Your output is not consistent with your spec for the given input.

这篇关于awk从文件1中顺序查找丢失的数字,并将其追加到文件2中的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆