从两个不同的文件匹配最接近的值,并打印特定的列 [英] Match closest value from two different files and print specific columns

查看:94
本文介绍了从两个不同的文件匹配最接近的值,并打印特定的列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

大家好我有他们每两个文件有N列和M行。

Hi guys I have two files each of them with N columns and M rows.

文件1

1 2 4 6 8
20 4 8 10 12
15 5 7 9 11

文件2

1 a1 b1 c5 d1
2 a1 b2 c4 d2
3 a2 b3 c3 d3
19 a3 b4 c2 d4
14 a4 b5 c1 d5

和我需要的是搜索列1最接近的值,并在打印输出特定的列。因此,例如输出应该是:

And what I need is to search the closest value in the column 1, and print specific columns in the output. so for example the output should be:

FILE3

1 2 4 6 8
1 a1 b1 c5 d1
20 4 8 10 12
19 a3 b4 c2 d4
15 5 7 9 11
14 a4 b5 c1 d5

自1 = 1,19是最接近20和14至15,输出那些行。
我怎样才能做到这一点AWK或任何其他工具?

Since 1 = 1, 19 is the closest to 20 and 14 to 15, the output are those lines. How can I do this in awk or any other tool?

帮助!

这是我到现在为止:

echo "ARGIND == 1 {
s1[\$1]=\$1;
s2[\$1]=\$2;
s3[\$1]=\$3;
s4[\$1]=\$4;
s5[\$1]=\$5;
}
ARGIND == 2 {
bestdiff=-1;
for (v in s1)
if (bestdiff < 0 || (v-\$1)**2 <= bestdiff) 
{
s11=s1[v];
s12=s2[v];
s13=s3[v];
s14=s4[v];
s15=s5[v];
bestdiff=(v-\$1)**2;
if (bestdiff < 2){
print \$0
print s11,s12,s13,s14,s15}}">diff.awk
awk -f diff.awk file2 file1

输出:

1 2 4 6 8
1 a1 b1 c5 d1
20 4 8 10 12
19 a3 b4 c2 d4
15 5 7 9 1
14 a4 b5 c1 d5
1 2
1 1
14 15

我不知道为什么最后三行。

I have no idea why the last three lines.

推荐答案

我想给一个办法回答结束了什么:

What I ended with trying to give a way to answer:

function closest(b,i) { # define a function
  distance=999999; # this should be higher than the max index to avoid returning null
  for (x in b) { # loop over the array to get its keys
    (x+0 > i+0) ? tmp = x - i : tmp = i - x # +0 to compare integers, ternary operator to reduce code, compute the diff between the key and the target
    if (tmp < distance) { # if the distance if less than preceding, update
      distance = tmp
      found = x # and save the key actually found closest
    }
  }
  return found  # return the closest key
}

{ # parse the files for each line (no condition)
   if (NR>FNR) { # If we changed file (File Number Record is less than Number Record) change array
     b[$1]=$0 # make an array with $1 as key
   } else {
     akeys[max++] = $1 # store the array keys to ensure order at end as for (x in array) does not guarantee the order
     a[$1]=$0 # make an array with $1 as key
   }
}

END { # Now we ended parsing the two files, print the result
  for (i in akeys) { # loop over the first file keys
    print a[akeys[i]] # print the value for this file
    if (akeys[i] in b) { # if the same key exist in second file
      print b[akeys[i]] # then print it
    } else {
      bindex = closest(b,akeys[i]) # call the function to find the closest key from second file
      print b[bindex] # print what we found
    }
  }
}

我希望这是足够的注释是明确的,随时在需要进行评论。

I hope this is enough commented to be clear, feel free to comment if needed.

警告如果您有大量的在第二个文件中的行作为第二阵列将被解析为第一个文件的每一个键这是不是在第二present这可能会很慢文件 /警告:

Warning This may become really slow if you have a large number of line in the second file as the second array will be parsed for each key of first file which is not present in second file./Warning

由于您的样品输入A1和A2:

Given your sample inputs a1 and a2:

$ mawk -f closest.awk a1 a2
1 2 4 6 8
1 a1 b1 c5 d1
20 4 8 10 12
19 a3 b4 c2 d4
15 5 7 9 11
14 a4 b5 c1 d5

这篇关于从两个不同的文件匹配最接近的值,并打印特定的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆