在数字列中,找到最接近某个目标值的值 [英] In a column of numbers, find the closest value to some target value

查看:196
本文介绍了在数字列中,找到最接近某个目标值的值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我在列中有一些数值数据,例如

Let's say I have some numerical data in columns, something like

11.100000 36.829657 6.101642
11.400000 36.402069 5.731998
11.700000 35.953025 5.372652
12.000000 35.482082 5.023737
12.300000 34.988528 4.685519
12.600000 34.471490 4.358360
12.900000 33.930061 4.042693
13.200000 33.363428 3.738985
13.500000 32.770990 3.447709
13.800000 32.152473 3.169312

我也有一个目标值和一个列索引.给定这组数据,我想在具有指定索引的列中找到最接近目标值的值.

I also have a single target value and a column index. Given this set of data, I want to find the closest value to the target value in the column with the specified index.

例如,如果我的目标值是1列中的11.6,则脚本应输出11.7.如果有两个数字与目标值等距,则应输出较高的值.

For example, If my target value is 11.6 in column 1, then the script should output 11.7. If there are two numbers equidistant from the target value, then the higher value should be output.

我觉得awk具有执行此操作所需的功能,但是欢迎在bash脚本中工作的任何解决方案.

I have a feeling that awk has the necessary functionality to do this, but any solution that works in a bash script is welcome.

推荐答案

尝试一下:

awk -v c=2 -v t=35 'NR==1{d=$c-t;d=d<0?-d:d;v=$c;next}{m=$c-t;m=m<0?-m:m}m<d{d=m;v=$c}END{print v}' file

-v c=2-v t=35可以是动态值.它们是idx列(c)和目标值(t).在上一行中,参数为第2列,目标为25.它们可以是shell变量.

the -v c=2 and -v t=35 could be dynamic value. they are the column idx (c) and your target value (t). in the above line, the parameter is column 2 and target 25. They could be shell variable.

根据给定的输入数据,以上行的输出为:

the output of above line based on given input data is:

kent$  awk -v c=2 -v t=35 'NR==1{d=$c-t;d=d<0?-d:d;v=$c;next}{m=$c-t;m=m<0?-m:m}m<d{d=m;v=$c}END{print v}' f
34.988528

kent$  awk -v c=1 -v t=11.6 'NR==1{d=$c-t;d=d<0?-d:d;v=$c;next}{m=$c-t;m=m<0?-m:m}m<d{d=m;v=$c}END{print v}' f
11.700000

编辑

如果有两个与目标值等距的数字,则应输出较高的值

If there are two numbers equidistant from the target value, then the higher value should be output

上面的代码没有检查这一要求....下面的代码应该可以工作:

The above codes didn't check this requirement.... the below one should work:

awk -v c=1 -v t=11.6 '{a[NR]=$c}END{
        asort(a);d=a[NR]-t;d=d<0?-d:d;v = a[NR]
        for(i=NR-1;i>=1;i--){
                m=a[i]-t;m=m<0?-m:m
                if(m<d){
                    d=m;v=a[i]
                }
        }
        print v
}' file

测试:

kent$  awk -v c=1 -v t=11.6 '{a[NR]=$c}END{
        asort(a);d=a[NR]-t;d=d<0?-d:d;v = a[NR]
        for(i=NR-1;i>=1;i--){
                m=a[i]-t;m=m<0?-m:m
                if(m<d){
                    d=m;v=a[i]
                }
        }
        print v
}' f
11.700000

简短说明.

我不会解释每一行代码,它会做什么.只是告诉我完成这项工作的想法.

I won't explain each line of code, what it does. just tell a bit the idea to do the job.

  • 首先读取给定列中的所有元素,保存在数组中
  • 对数组进行排序.
  • 从数组中获取最后一个元素(最大数量).将其分配给var v,并计算其与给定目标之间的差异,将其(绝对值)保存在d
  • 从数组循环中的倒数第二个元素到第一个.如果元素和目标之间的差异(绝对值)小于d,则用diff覆盖d,还将当前元素保存到v
  • 打印v,循环后,v是答案.
  • first read all element in the given column, save in an array
  • sort the array.
  • take the last element from the array(the greatest number). assign it to var v, and calculate the diff between it and the given target, save it(absolute value) in d
  • from the 2nd last element from the array loop to the first. if diff between element and target (absolute value) is less than d, overwrite d with diff, also save current element into v
  • print v, after looping, v is the answer.

一些笔记:

  • 还有优化逻辑的空间.例如我们不必遍历整个数组.只需比较d(abs),如果new diff> d,我们就可以停止循环.
  • 由于排序,此算法为 O(nlogn) .实际上,此问题可以通过 O(n) 解决.如果您的输入数据很大且情况最糟(例如,您的列的值在500-99999999999范围内,但目标为1),则可能要避免排序.但我认为性能不是您的问题.
  • there is room to optimize the logic. e.g. we don't have to loop thru the whole array. just compare the d(abs), if new diff > d, we can stop the loop.
  • due to the sort, this algorithm is O(nlogn). in fact this problem could be solved by O(n). If your input data were huge, and with a worst case(e.g. your column has value in range 500-99999999999, but your target is 1.) you may want to avoid the sort. but I assume the performance is not an issue by you.

这篇关于在数字列中,找到最接近某个目标值的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆