时间数组超出建模范围? [英] time array out of bounds in modelling?

查看:103
本文介绍了时间数组超出建模范围?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这些天,我试图用新的气象数据(以netcdf格式而不是旧的cray格式给出)运行我的气候模型.该模型的编译过程很顺利,但是,当需要进行仿真时,该模型在第一天就可以很好地运行,但是在仿真的第二天,无论我使用什么开始日期,它总是在同一时间步停止.错误是:

These days I'm trying to run my climate model with new meteorology data (which is given in netcdf format instead of the old cray format). The model is compiled smoothly, however when it's time for the simulation the model runs well the first day but it stops in the second day of simulation, always at same time step, no matter what start date I use. The error is:

forrtl:严重(408):堡垒:(2):数组TIMEVALS的下标#1的值141大于上限140.

forrtl: severe (408): fort: (2): Subscript #1 of the array TIMEVALS has value 141 which is greater than the upper bound of 140.

因此,我进行了一些研究,并遍历了源代码,以了解什么样的数组timevals所指,然后发现它所引用的是具有时间维度的这种新气象数据,并且这种时间数组由140个元素组成.每个元素都是该模型应用于模拟的气象数据的特定日期和时间...因此我开始认为这是我的代码存在的问题,但是我的同事却能够在没有任何条件的情况下运行该模型问题,这对我来说很奇怪.他在Makefile中使用一些不同的设置编译了模型,我不知道这是否重要,我仍然对fortran等不是很熟悉.但是,下面是使用此TIMEVALS数组的代码部分:

So I did some research and went through my source code to see what kind of array timevals refers to and I found out that it refers to this new meteorology data which has a time dimension, and such time array is made of 140 elements. Each element is a specific date and time of the meteo data that the model is supposed to use for the simulation...so I started to believe that it's a problem of my code, but my collegue has been able to run the model with no issues, which was strange to me. He compiled the model with some different settings in the Makefile, I don't know if this matters, I'm still not very familiar with fortran etc. However here below is the part of the code that uses this TIMEVALS array:

           CASE(2)        ! nudging data is in netcdf-format

           cfile = str_filter(ndg_file_nc,yr,mo,dy,hr,mi,se,ndgblock)
           CALL message('   Adjust date using file: ',TRIM(cfile))

           IF (p_parallel_io) THEN
             INQUIRE(file=cfile,exist=found)
             IF (.NOT.found) &
                CALL finish('NudgingInit','Nudging data file not found.')
             ndgfile%format = NETCDF
             CALL IO_open (cfile, ndgfile, IO_READ)
             CALL IO_INQ_DIMID(ndgfile%file_id, 'time', ndimid)
             CALL IO_INQ_DIMLEN(ndgfile%file_id, ndimid, nts)
             CALL IO_INQ_VARID(ndgfile%file_id, 'time', nvarid)
             ALLOCATE (timevals(nts))
             CALL IO_GET_VAR_DOUBLE (ndgfile%file_id, nvarid, timevals)
             ihead_nc(1) = FLOOR(timevals(1))                          ! ihead_nc(1) is YYYYMMDD
             ihead_nc(2) = INT((timevals(1)-ihead_nc(1))*24._dp)       ! ihead_nc(2) is HH
             DEALLOCATE (timevals)
           ENDIF

           IF (p_parallel) CALL p_bcast(ihead_nc, p_io)
           CALL inp_convert_date(ihead_nc(1),ihead_nc(2)*10000, ndg_date0)

             IF (p_parallel_io) THEN
             ! skip first record and read second header
             ALLOCATE (timevals(nts))
             CALL IO_GET_VAR_DOUBLE (ndgfile%file_id, nvarid, timevals)
             ihead_nc(1) = FLOOR(timevals(2))                          ! ihead_nc(1) is YYYYMMDD
             ihead_nc(2) = INT((timevals(2)-ihead_nc(1))*24._dp)       ! ihead_nc(2) is HH
             DEALLOCATE (timevals)
             CALL IO_close(ndgfile)
           ENDIF

           IF (p_parallel) CALL p_bcast(ihead_nc, p_io)
           CALL inp_convert_date(ihead_nc(1),ihead_nc(2)*10000, ndg_date1)

ndg_filendg_date指的是微调(元数据) 你们对什么可能导致此错误有任何想法吗?

ndg_file and ndg_date refer to nudging (meteo data) Do you guys have any idea of what might cause this error?

推荐答案

我现在有一些时间来阐述我之前的评论. (请注意,我用斜体表示您可能希望阅读的术语.)

I've got some time now to elaborate on my earlier comment. (Note that I use italics denote terms you might care to read about.)

您报告的错误是运行时错误,而不是编译器在编译时能够看到的错误.如果您不了解运行时(即代码执行时)和 compile-time (即编译器将您的源代码转换为可执行代码)之间的区别,那么可以一些研究.此外,很明显,您(或某人)已指示编译器创建代码版本,以检查数组元素访问是否在数组范围内.在测试新软件时,这是一项非常重要的安全功能,但是当代码执行了如此多的代码后,一旦通过测试,便会在不进行此检查的情况下进行编译,这会带来性能损失.

The error you report is a run-time error, not one that the compiler is able to see at compile-time. If you don't understand the difference between run-time (ie when the code executes) and compile-time (ie when the compiler turns your sources into executable code) do some research. Furthermore it's evident that you (or someone) has instructed the compiler to create a version of the code which checks that array element accesses are within array bounds. This is a very important safety feature when testing new software, but imposes a performance penalty when the code executes so many codes are, once they've passed their tests, compiled without this checking.

我不知道您使用的是什么编译器,但请查看其文档以找到在运行时打开数组边界检查的选项.

I don't know what compiler you're using but look at its documentation to find an option that turns on array bounds checking at run-time.

错误消息非常明显-在代码中的某个时刻,它试图访问仅包含140个元素的数组的元素141.我们无法告诉您这是如何发生的,即使我们看到了您的整个代码,也可能不会告诉您.当加载的数据不符合程序员的期望时,通常会发生这种情况.当程序员在编写循环时犯一处错误错误时,通常也会发生这种情况.通过查看您的整个代码,我们可能会发现这一点,但是与我们相比,您处于更有利的位置.

The error message is quite explicit -- at some point in your code it has tried to access element 141 of an array with only 140 elements. We can't tell you how this has happened, probably not even if we saw your entire code. This kind of thing often happens when data is loaded that doesn't conform to the programmer's expectations. It also often happens when programmers make off-by-one errors in writing loops. We might spot that from looking at your whole code, but you're in a much better position to do that than we are.

您写

但是我的同事已经能够运行该模型而没有任何问题,这 对我来说很奇怪.他用一些不同的设置编译了模型 在Makefile中,我不知道这是否重要

but my collegue has been able to run the model with no issues, which was strange to me. He compiled the model with some different settings in the Makefile, I don't know if this matters,

是的,这很重要,很重要.如果您编写的代码可以访问包含140个元素的数组的元素141,那么像许多其他编译语言一样,Fortran会很高兴地访问元素140之后的内存中的下一个位置.通常,您不知道程序会干扰哪些数据.如果幸运的话,内存中的下一个位置在操作系统已分配给程序的地址空间之外,并且操作系统会立即停止程序并报告分段错误.

Well, yes, this matters, it matters a lot. If you write code that accesses element 141 of an array with 140 elements Fortran, like many other compiled languages, will happily access the next location in memory after element 140. In general you haven't a clue what data the program is interfering with. If you are lucky the next location in memory is outside the address space the operating system has allocated to the program and the operating system stops the program immediately and reports a segmentation fault.

如果您不走运,该程序将继续流畅地读取和写入元素141,无论它到底是什么.

If you're unlucky the program carries on blithely reading from, and writing to, element 141, whatever the heck it is.

我推测您的同事尚未对其代码版本实施数组边界检查.是否告诉他他的代码(很可能)已损坏取决于您自己.

I speculate that your colleague has not implemented array-bounds checking for his version of the code. It's up to you whether or not you tell him his code's (very probably) broken.

那你怎么办呢?您调试该程序.您可以通过多种方式执行此操作,最简单的方法是(在我看来)是在代码的关键点插入一些write语句以打印出变量值,以测试您对它们可能,可以,或实际上确实需要.更加困难,但值得在将来解决问题方面进行初步努力,那就是在 debugger 的控制下运行代码.在所有主要平台上,都有几种适用于Fortran程序的调试器.

So what do you do about it ? You debug the program. You can do this in a variety of ways, the easiest of which is (in my opinion) to insert some write statements to print out variable values at critical points in the code to test your assumptions about what values they might, can, or actually do take. More difficult, but worth the initial effort in terms of future problem-solving, would be to run the code under the control of a debugger. There are several good debuggers available for Fortran programs on all major platforms.

这篇关于时间数组超出建模范围?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆