为什么不AWK处理这个数组索引为数字,除非我用INT()? [英] Why does AWK not treat this array index as a number unless I use int()?

查看:103
本文介绍了为什么不AWK处理这个数组索引为数字,除非我用INT()?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下类型的基因组文件:

I have genomics files of the following type:

$ cat test-file_long.txt 
2 41647 A G
2 45895 A G
2 45953 T C
2 224919 A G
2 230055 C G
2 233239 A G
2 234130 T G
2 23454 T C

当我使用下面的短awk脚本,它不会返回所有这些都比if语句中使用的元素更大的元素:

When I use the following short AWK script, it does not return all of the elements which are greater than the element used in the if statement:

{
    a[$2]
}
END{
    for (i in a){
    if(i > 45895) 
    print i
    }
}

该脚本返回这样的:

The script returns this:

$ awk -f practice.awk test-file_long.txt 
45953

然而,当我改变如果使用的INT()语句,它返回实际上大于,因为我想行:

However, when I change the if statement using int(), it returns the lines that are in fact greater than, as I want:

{
    a[$2]
}
END{
    for (i in a){
    if(int(i) > 45895) 
    print i
    }
}

结果:

$ awk -f practice.awk test-file_long.txt 
233239
230055
234130
224919
45953

似乎它是仅使与第一个数字的比较,并且如果它们是相同的,它看起来在下一位数,但它不处理整数。能向我解释的人是什么有关关联数组的内部机制,它不会使数字> /<比较除非我指定我想要的数组元素的INT()?如果我的数组元素是花车和INT()是不是一种选择?

It appears it is only making the comparison with the first digit, and if they are the same it looks at the next digit, but it does not process the whole number. Can someone explain to me what it is about the internal mechanism of the associative array that it does not make the numeric >/< comparison unless I specify that I want the int() of the array element? What if my array elements were floats and int() was not an option?

推荐答案

在awk的数组键是字符串,所以按字母顺序比较正在这里完成。在你的第一个例子中, 459 大于 458 按字母顺序,因此它通过了测试。

Array keys in awk are strings, so alphabetical comparison is being done here. In your first example, 459 is greater than 458 alphabetically, so it passes the test.

如果你唯一的目标就是打印其第二列是&GT的线条; 45895 数值的,这会做:

If your only goal is to print the lines whose 2nd column is > 45895 numerically, this would do:

awk '$2 > 45895' test-file_long.txt

变量这取决于它们被评估的上下文改变类型。因此,通过把一个变量在明确的数值范围内,它会被如此对待。 @ 的格伦的建议,我+ 0 完美地说明了这一点。

Variables change type depending on the context in which they are evaluated. So by putting a variable in an explicitly numeric context, it will be treated as such. @glenn's suggestion of i+0 demonstrates this perfectly.

另外,一元加运算 + I 可用于一个前pression转换为数字。所以,你再例如可改为:

Alternatively, the unary plus operator +i can be used to convert an expression to a number. So your longer example could be changed to:

awk '{a[$2]} END { for (i in a) { if (+i > 45895) print i } }' test-file_long.txt

这篇关于为什么不AWK处理这个数组索引为数字,除非我用INT()?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆