AWK:从2档匹配列值,如果它们的数值接近 [英] Awk: Match column values from 2 files if their numerical values are close
问题描述
随着我的第一个问题在这里( awk的:列数长度)
Following my first question here (Awk: Length of column number)
我的数据:
文件1
8.193506084253E+06 1.900521460E+01
8.193538509494E+06 1.899919490E+01
8.193540934736E+06 1.899317535E+01
8.193543359977E+06 1.898720476E+01
8.193546406105E+06 1.897934066E+01
文件2
8.193505938557E+06 1.572155163E+01
8.193509618041E+06 1.573016361E+01
8.193513297526E+06 1.573874442E+01
8.193516977010E+06 1.574725969E+01
我想从文件2采取了$ 1,在文件1 $ 1最最接近*值搜索,为了得到这样的例子输出
I want to take $1 from File 2 and search in File 1 the most closest* value in $1, in order to get an output like this example
8.193505938557E+06 1.572155163E+01 1.900521460E+01
在这种情况下,列$ 1文件2中只有第一个值有比赛,没有别的,因为$ 1从文件2其他值不是从文件1足够接近(定义一些条件),以$ 1的值
In this case the only the first value of column $1 in file 2 has a match, and nothing else because the other values of $1 from File 2 are not close enough (defining some condition) to any value of $1 from File 1
请注意该行数是不同的。结果,
*最接近=其中两个数字之间的差大于某个阈值
Note that the number of rows are different.
*closest= where the difference between the two numbers is smaller than some threshold
推荐答案
据我了解,根据你的描述的结果应该是:
To my understanding, according to your description the result should be:
1235.34 d a
3457.23 e b
7589.34 f b
即。包括F的线最接近b的
i.e. including a line for "f" which is closest to "b".
这可以用下面的脚本来完成:
This can be done using the following script:
ARGIND == 1 {
haystack[$1] = $2;
}
ARGIND == 2 {
bestdiff=-1;
for (v in haystack)
if (bestdiff < 0 || (v-$1)**2 < bestdiff) {
bestkey=haystack[v];
bestdiff=(v-$1)**2;
}
print $1, $2, bestkey;
}
(我使用的是通过现蕾 ** 2
为取绝对值的替代品。)
(I'm using squaring via **2
as a substitute for taking the absolute value.)
如果你想用晚餐preSS的结果,如果不同的是,例如大于10,让你引述的结果,使用这样的:
If you want to suppress results if the difference is for example greater than 10, to get the result you quoted, use something like this:
if (bestdiff < 10**2)
print $1, $2, bestkey;
编辑:的OP改变了问题IN-的例子和输出。下面是引用原始的示例文件。文件1:
The OP changed the example in- and output in the question. Here are the original example files for reference. File 1:
1234.34 a
3456.23 b
2325.89 c
2326.20 c2
文件2:
1235.34 d
3457.23 e
7589.34 f
输出:
1235.34 d a
3457.23 e b
注意: ARGIND
和 **
是GNU扩展。看到mklement0评论下面的详细资料。
Note: ARGIND
and **
are GNU extensions. See comment from mklement0 below for details.
这篇关于AWK:从2档匹配列值,如果它们的数值接近的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!