用该列的最大值标准化列数据 [英] normalize column data with maximum value of that column

查看:82
本文介绍了用该列的最大值标准化列数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含两列的数据文件.我想从第二列中找出最大数据值,并将第二列中的每个条目除以最大值.(因此,我将在第二列< = 1.00中获得所有条目).

I have a data file with two columns. I want to find out the maximum data value from the second column and divide each entries of second column witht he maximum value. (So I will get all the entries in second column <= 1.00).

我在下面尝试了此命令:

I tried with this command below:

awk 'BEGIN {max = 0} {if ($2>max) max=$2} {print  ($2/max)}' angleOut.dat

但是我收到如下错误消息.

but I get error message as below.

awk: (FILENAME=angleOut.dat FNR=1) fatal: division by zero attempted

注意:第二列中有一些数据为零值.但是,当零值除以最大值时,我应该得到零,但如上所述却会出错.

note: There are some data in the second column which is zero value. But when the zero value divide with max value, I should get zero, but I get error as above.

我可以为此提供任何帮助吗?

Could I get any help for this?

非常感谢.

推荐答案

让我们将其作为示例输入文件:

Let's take this as the sample input file:

$ cat >file
1 5
2 2
3 7
4 6

此awk脚本将规范第二列:

This awk script will normalize the second column:

$ awk 'FNR==NR{max=($2+0>max)?$2:max;next} {print $1,$2/max}' file file
1 0.714286
2 0.285714
3 1
4 0.857143

此脚本两次读取输入的 file .第一次,它找到最大值.第二次是打印第二列已标准化的行.

This script reads through the input file twice. The first time, it finds the maximum. The second time is prints the lines with the second column normalized.

考虑:

max=($2+0>max)?$2:max

这是if-then-else语句的紧凑形式."if"部分为 $ 2 + 0> max .如果计算结果为true,则将?之后的值分配给 max .如果为假,则将:之后的值分配给 max .

This is a compact form of an if-then-else statement. The "if" part is $2+0>max. If this evaluates to true, the value following the ? is assigned to max. If it is false, then the value following the : is assigned to max.

if 语句的更明确的形式也很好用.

The more explicit form of an if statement works well too.

此外,请注意咒语 $ 2 + 0 .在 awk 中,变量可以根据上下文为字符串或数字.在字符串上下文中,> 比较字典顺序.我们想要一个数值比较.通过在 $ 2 上加零,我们消除了所有疑问,并强制 awk $ 2 视为数字.

Also, note that incantation $2+0. In awk variables can be strings or numbers according to context. In string context, > compares lexicographic ordering. We want a numeric comparison. By adding zero to $2, we are removing all doubt and forcing awk to treat $2 as a number.

这篇关于用该列的最大值标准化列数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆