用GAWK打印数千个分离的浮点 [英] Printing thousand separated floats with GAWK

查看:70
本文介绍了用GAWK打印数千个分离的浮点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须用gawk处理一些大文件.我的主要问题是我必须使用千位分隔符打印一些浮点数.例如:10000在输出中应显示为10.000,而10000,01应显示为10.000,01.

I must process some huge file with gawk. My main problem is that I have to print some floats using thousand separators. E.g.: 10000 should appear as 10.000 and 10000,01 as 10.000,01 in the output.

我(和Google)想出了此功能,但这对于浮动对象却失败了:

I (and Google) come up with this function, but this fails for floats:

function commas(n) {
  gsub(/,/,"",n)
  point = index(n,".") - 1
  if (point < 0) point = length(n)
    while (point > 3) {
      point -= 3
      n = substr(n,1,point)"."substr(n,point + 1)
    }
  sub(/-\./,"-",n)
  return d n
}

但是它会因为浮点数而失败.

But it fails with floats.

现在我正在考虑将输入拆分为整数和< 1部分,然后在格式化整数后再次将它们粘在一起,但是没有更好的方法吗?

Now I'm thinking of splitting the input to an integer and a < 1 part, then after formatting the integer gluing them again, but isn't there a better way to do it?

免责声明:

  • 我不是程序员
  • 我知道通过一些SHELL环境.变量可以设置千位分隔符,但它必须在具有不同语言和/或区域设置的不同环境中工作.
  • 英语是我的第二语言,抱歉,如果我使用不正确

推荐答案

由于您输入的是欧洲类型的数字(百万分之一和四分之一的整数为1.000.000,25),因此浮点运算会失败.如果只是在逗号和句点之间进行更改,那么您提供的功能应该可以使用.我会先用1000000.25测试当前版本,以查看它是否适用于非欧洲数字.

It fails with floats because you're passing in European type numbers (1.000.000,25 for a million and a quarter). The function you've given should work if you just change over commas and periods. I'd test the current version first with 1000000.25 to see if it works with non-European numbers.

可以使用"echo 1 | awk -f xx.gawk"调用以下awk脚本,它将向您显示实际的正常"版本和欧洲版本.它输出:

The following awk script can be called with "echo 1 | awk -f xx.gawk" and it will show you both the "normal" and European version in action. It outputs:

123,456,789.1234
123.456.789,1234

显然,您只对函数感兴趣,实际代码将使用输入流将值传递给函数,而不是固定字符串.

Obviously, you're only interested in the functions, real-world code would use the input stream to pass values to the functions, not a fixed string.

function commas(n) {
    gsub(/,/,"",n)
    point = index(n,".") - 1
    if (point < 0) point = length(n)
    while (point > 3) {
        point -= 3
        n = substr(n,1,point)","substr(n,point + 1)
    }
    return n
}
function commaseuro(n) {
    gsub(/\./,"",n)
    point = index(n,",") - 1
    if (point < 0) point = length(n)
    while (point > 3) {
        point -= 3
        n = substr(n,1,point)"."substr(n,point + 1)
    }
    return n
}
{ print commas("1234,56789.1234") "\n" commaseuro("12.3456789,1234") }

除了处理逗号和句点外,其他功能是相同的.在以下说明中,我们将其称为分隔符和小数:

The functions are identical except in their handling of commas and periods. We'll call them separators and decimals in the following description:

  • gsub删除了所有现有的分隔符,因为我们会将它们放回去.
  • point会找到小数点在哪里,因为这是我们的起点.
  • 如果没有小数,则if语句从末尾开始.
  • 当剩下三个以上的字符时,我们循环播放.
  • 在循环内部,我们调整插入分隔符的位置,然后将其插入.
  • 循环完成后,我们将返回调整后的值.

这篇关于用GAWK打印数千个分离的浮点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆