为什么不AWK处理这个数组索引为数字,除非我用INT()? [英] Why does AWK not treat this array index as a number unless I use int()?
问题描述
我有以下类型的基因组文件:
I have genomics files of the following type:
$ cat test-file_long.txt
2 41647 A G
2 45895 A G
2 45953 T C
2 224919 A G
2 230055 C G
2 233239 A G
2 234130 T G
2 23454 T C
当我使用下面的短awk脚本,它不会返回所有这些都比if语句中使用的元素更大的元素:
When I use the following short AWK script, it does not return all of the elements which are greater than the element used in the if statement:
{
a[$2]
}
END{
for (i in a){
if(i > 45895)
print i
}
}
该脚本返回这样的:
The script returns this:
$ awk -f practice.awk test-file_long.txt
45953
然而,当我改变如果使用的INT()语句,它返回实际上大于,因为我想行:
However, when I change the if statement using int(), it returns the lines that are in fact greater than, as I want:
{
a[$2]
}
END{
for (i in a){
if(int(i) > 45895)
print i
}
}
结果:
$ awk -f practice.awk test-file_long.txt
233239
230055
234130
224919
45953
似乎它是仅使与第一个数字的比较,并且如果它们是相同的,它看起来在下一位数,但它不处理整数。能向我解释的人是什么有关关联数组的内部机制,它不会使数字> /<比较除非我指定我想要的数组元素的INT()?如果我的数组元素是花车和INT()是不是一种选择?
It appears it is only making the comparison with the first digit, and if they are the same it looks at the next digit, but it does not process the whole number. Can someone explain to me what it is about the internal mechanism of the associative array that it does not make the numeric >/< comparison unless I specify that I want the int() of the array element? What if my array elements were floats and int() was not an option?
推荐答案
在awk的数组键是字符串,所以按字母顺序比较正在这里完成。在你的第一个例子中, 459
大于 458
按字母顺序,因此它通过了测试。
Array keys in awk are strings, so alphabetical comparison is being done here. In your first example, 459
is greater than 458
alphabetically, so it passes the test.
如果你唯一的目标就是打印其第二列是&GT的线条; 45895
的数值的,这会做:
If your only goal is to print the lines whose 2nd column is > 45895
numerically, this would do:
awk '$2 > 45895' test-file_long.txt
变量这取决于它们被评估的上下文改变类型。因此,通过把一个变量在明确的数值范围内,它会被如此对待。 @ 的格伦的建议,我+ 0
完美地说明了这一点。
Variables change type depending on the context in which they are evaluated. So by putting a variable in an explicitly numeric context, it will be treated as such. @glenn's suggestion of i+0
demonstrates this perfectly.
另外,一元加运算 + I
可用于一个前pression转换为数字。所以,你再例如可改为:
Alternatively, the unary plus operator +i
can be used to convert an expression to a number. So your longer example could be changed to:
awk '{a[$2]} END { for (i in a) { if (+i > 45895) print i } }' test-file_long.txt
这篇关于为什么不AWK处理这个数组索引为数字,除非我用INT()?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!