AWK:从2档匹配列值,如果它们的数值接近 [英] Awk: Match column values from 2 files if their numerical values are close

查看:160
本文介绍了AWK:从2档匹配列值,如果它们的数值接近的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

随着我的第一个问题在这里( awk的:列数长度)

Following my first question here (Awk: Length of column number)

我的数据:

文件1

8.193506084253E+06 1.900521460E+01
8.193538509494E+06 1.899919490E+01
8.193540934736E+06 1.899317535E+01
8.193543359977E+06 1.898720476E+01
8.193546406105E+06 1.897934066E+01

文件2

8.193505938557E+06 1.572155163E+01
8.193509618041E+06 1.573016361E+01 
8.193513297526E+06 1.573874442E+01 
8.193516977010E+06 1.574725969E+01

我想从文件2采取了$ 1,在文件1 $ 1最最接近*值搜索,为了得到这样的例子输出

I want to take $1 from File 2 and search in File 1 the most closest* value in $1, in order to get an output like this example

 8.193505938557E+06 1.572155163E+01 1.900521460E+01

在这种情况下,列$ 1文件2中只有第一个值有比赛,没有别的,因为$ 1从文件2其他值不是从文件1足够接近(定义一些条件),以$ 1的值

In this case the only the first value of column $1 in file 2 has a match, and nothing else because the other values of $1 from File 2 are not close enough (defining some condition) to any value of $1 from File 1

请注意该行数是不同的。结果,
*最接近=其中两个数字之间的差大于某个阈值

Note that the number of rows are different.
*closest= where the difference between the two numbers is smaller than some threshold

推荐答案

据我了解,根据你的描述的结果应该是:

To my understanding, according to your description the result should be:

1235.34 d a
3457.23 e b
7589.34 f b

即。包括F的线最接近b的

i.e. including a line for "f" which is closest to "b".

这可以用下面的脚本来完成:

This can be done using the following script:

ARGIND == 1 {
    haystack[$1] = $2;
}
ARGIND == 2 {
    bestdiff=-1;
    for (v in haystack)
        if (bestdiff < 0 || (v-$1)**2 < bestdiff) {
            bestkey=haystack[v];
            bestdiff=(v-$1)**2;
        }
    print $1, $2, bestkey;
}

(我使用的是通过现蕾 ** 2 为取绝对值的替代品。)

(I'm using squaring via **2 as a substitute for taking the absolute value.)

如果你想用晚餐preSS的结果,如果不同的是,例如大于10,让你引述的结果,使用这样的:

If you want to suppress results if the difference is for example greater than 10, to get the result you quoted, use something like this:

if (bestdiff < 10**2)
    print $1, $2, bestkey;

编辑:的OP改变了问题IN-的例子和输出。下面是引用原始的示例文件。文件1:

The OP changed the example in- and output in the question. Here are the original example files for reference. File 1:

1234.34  a 
3456.23  b 
2325.89  c 
2326.20  c2

文件2:

1235.34 d
3457.23 e
7589.34 f

输出:

1235.34 d a
3457.23 e b

注意: ARGIND ** 是GNU扩展。看到mklement0评论下面的详细资料。

Note: ARGIND and ** are GNU extensions. See comment from mklement0 below for details.

这篇关于AWK:从2档匹配列值,如果它们的数值接近的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆