有效地计算浮点Bash中运算数十万次 [英] Efficiently computing floating-point arithmetic hundreds of thousands of times in Bash

查看:207
本文介绍了有效地计算浮点Bash中运算数十万次的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的研究机构的研究风暴潮的计算工作,并在尝试使用bash命令来自动完成一些高性能计算的。目前,该方法是我们从诺阿下载数据和手动创建命令文件,行由行,以时间为程序从该文件和一个风倍率读出的数据一起输入的每个文件的位置。有数以百计的每个下载NOAA产生这些数据文件,它出来的,每6个小时左右,当风暴正在进行中。这意味着,在风暴期间大量时间都花在做这些命令文件。

I work for a research institute that studies storm surges computationally, and am attempting to automate some of the HPC commands using Bash. Currently, the process is we download the data from NOAA and create the command file manually, line-by-line, inputting the location of each file along with a time for the program to read the data from that file and a wind magnification factor. There are hundreds of these data files in each download NOAA produces, which come out every 6 hours or so when a storm is in progress. This means that much of our time during a storm is spent making these command files.

我在有限的工具,我可以使用,因为我只是有一个用户帐户和时间上的超级计算机每月分配给该过程自动化;我没有对他们安装新软件的特权。加,其中有些是Crays,有些是IBM的,有些是惠普,等等。没有它们之间的一致的操作系统;唯一的相似之处是它们都是基于Unix的。所以,我有我的处置工具,如猛砸,Perl中,awk和Python的,但不一定像CSH和ksh,zsh的,BC工具,等等:

I am limited in the tools I can use to automate this process because I simply have a user account and a monthly allotment of time on the supercomputers; I do not have the privilege to install new software on them. Plus, some of them are Crays, some are IBMs, some are HPs, and so forth. There isn't a consistent operating system between them; the only similarity is they are all Unix-based. So I have at my disposal tools like Bash, Perl, awk, and Python, but not necessarily tools like csh, ksh, zsh, bc, et cetera:

$ bc
-bash: bc: command not found

此外,我的首席科学家已要求所有code我为他写的是Bash的,因为他的理解,以最少的调用的东西猛砸不能做外部程序。例如,它不能做浮点算术和我需要能够添加浮动。我可以从内部调用猛砸Perl的,但这是缓慢的:

Further, my lead scientist has requested that all of the code I write for him be in Bash because he understands it, with minimal calls to external programs for things Bash cannot do. For example, it cannot do floating point arithmetic, and I need to be able to add floats. I can call Perl from within Bash, but that's slow:

$ time perl -E 'printf("%.2f", 360.00 + 0.25)'
360.25
real    0m0.052s
user    0m0.015s
sys     0m0.015s

第二的1/20似乎并不想了很久,但是当我不得不做出这个调用100次在一个文件中,这相当于大约5秒来处理一个文件。这并不是那么糟糕,当我们仅仅做这些,每6小时之一。但是,如果这项工作被抽象到一个更大的任务,人们在这里我们为了点1000合成风暴在大西洋流域一次学什么可能发生了风暴的更加强大和采取了不同的道路5秒快速增长到一个多小时只是为了处理文本文件。当你按小时计费,这就带来了问题。

1/20th of a second doesn't seem like a long time, but when I have to make this call 100 times in a single file, that equates to about 5 seconds to process one file. That isn't so bad when we are only making one of these every 6 hours. However, if this work is abstracted to a larger assignment, one where we point 1,000 synthetic storms at the Atlantic basin at one time in order to study what could have happened had the storm been stronger or taken a different path, 5 seconds quickly grows to more than an hour just to process text files. When you are billed by the hour, this poses a problem.

什么是加快这一个好办法吗?我现在有脚本中的循环(一个需要5秒的时间跑到):

What is a good way to speed this up? I currently have this for loop in the script (the one that takes 5 seconds to run):

for FORECAST in $DIRNAME; do
    echo $HOURCOUNT"  "$WINDMAG"  "${FORECAST##*/} >> $FILENAME;
    HOURCOUNT=$(echo "$HOURCOUNT $INCREMENT" | awk '{printf "%.2f", $1 + $2}');
done

我知道通过数据文件AWK或Perl循环单个调用会比呼吁无论是曾经为目录中的每个文件快一百倍,而这些语言都可以很容易地打开一个文件,并写入,但我遇到的问题是获得数据来回。我发现在这三种语言的大量资源单独(AWK,Perl和Python)的,但一直没能找到一个Bash脚本嵌入他们多。我已经能来最接近的是让这个外壳一个awk命令:

I know a single call to awk or Perl to loop through the data files would be a hundred times faster than calling either once for each file in the directory, and that these languages can easily open a file and write to it, but the problem I am having is getting data back and forth. I have found a lot of resources on these three languages alone (awk, Perl, Python), but haven't been able to find as much on embedding them in a Bash script. The closest I have been able to come is to make this shell of an awk command:

awk -v HOURCOUNT="$HOURCOUNT" -v INCREMENT="$INCREMENT" -v WINDMAG="$WINDMAG" -v DIRNAME="$DIRNAME" -v FILENAME="$FILENAME" 'BEGIN{ for (FORECAST in DIRNAME) do
    ...
}'

但我不能肯定这是正确的语法,如果是,如果它的去了解这一点,或者如果它甚至会在所有的工作的最好方法。我一直在打我的头撞在墙上,现在几天,决定要求互联网之前,我就插

But I am not certain that this is correct syntax, and if it is, if it's the best way to go about this, or if it will even work at all. I have been hitting my head against the wall for a few days now and decided to ask the internet before I plug on.

推荐答案

Bash是非常有能力,只要你有你需要的能力。对于浮点,你基本上有两种选择,要么 BC (其中至少你显示未安装[这是一种很难相信]盒)或 钙2.12.4.13.tar.bz2

Bash is very capable as long as you have the ability you need. For floating point, you basically have two options, either bc (which at least on the box you show isn't installed [which is kind of hard to believe]) or calc. calc-2.12.4.13.tar.bz2

无论包是灵活的,很能干的使用bash很好地集成浮点程序。因为这是权力对bash的一个preference,我将调查安装任一 BC 。 (工作保障是一件好事)

Either package is flexible and very capable floating-point programs that integrate well with bash. Since the powers that be have a preference for bash, I would investigate installing either bc or calc. (job security is a good thing)

如果你的上司可以确信允许任何 perl的蟒蛇,则要么会做。如果你从来没有在任何程序,都将有一个学习曲线,蟒蛇稍微更比 perl的。如果你的上司有可以读庆典,然后翻译 perl的会比更容易消化为它们蟒蛇

If your superiors can be convinced to allow either perl or python, then either will do. If you have never programmed in either, both will have a learning curve, python slightly more so than perl. If you superiors there can read bash, then translating perl would be much easier to digest for them than python.

这是你给你的情况为你解释它的选项公平的轮廓。不管你的选择,你的任务不应该在任何语言的那令人生畏。只是掉线回来时,你会被卡住。

This is a fair outline of the options you have given your situation as you've explained it. Regardless of your choice, the task for you should not be that daunting in any of the languages. Just drop a line back when you get stuck.

这篇关于有效地计算浮点Bash中运算数十万次的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆