Fortran:读取未知大小的文件时如何分配数组? [英] Fortran: How do I allocate arrays when reading a file of unknown size?

查看:22
本文介绍了Fortran:读取未知大小的文件时如何分配数组?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对 Fortran 的典型使用始于读取一个未知大小(通常为 5-100MB)的文件.我目前的数组分配方法涉及两次读取文件.首先确定问题的大小(分配数组),然后第二次将数据读入这些数组.

My typical use of Fortran begins with reading in a file of unknown size (usually 5-100MB). My current approach to array allocation involves reading the file twice. First to determine the size of the problem (to allocate arrays) and a second time to read the data into those arrays.

是否有更好的方法来确定大小/数组分配?我刚刚在另一篇似乎更容易的文章中阅读了有关自动数组分配(下面的示例)的内容.

Are there better approaches to size determination/array allocation? I just read about automatic array allocation (example below) in another post that seemed much easier.

array = [array,new_data]

所有选项及其优缺点是什么?

What are all the options and their pros and cons?

推荐答案

我会咬牙切齿,尽管这个问题摇摇欲坠.您的选择是:

I'll bite, though the question is teetering close to off-topicality. Your options are:

  1. 读取一次文件以获取数组大小,分配,再次读取.
  2. 逐段阅读,边走边(重新)分配.根据您的意愿选择要阅读的文章大小(或者,也许您认为最适合您的情况).
  3. 始终始终使用包含元数据的文件来告诉感兴趣的程序有多少数据;例如一个块标题行告诉您接下来有多少数据元素阻止.

选项 3 是迄今为止最好的.在项目开始时,多一点额外的想法和一整行代码,就节省了很多时间和精力.您不必跳上 HDF5 或类似的重量级文件设计方法,只需采用足够的纪律来维持文件内容的使用寿命.对于您的宇宙模拟中的逐次迭代转储,可以使用自制的方法(老实说,您是唯一会查看它们的人).对于以每 TB 大约 100 万美元的成本收集的数据(卫星观测、近海地震轨迹等),然后是 HDF5 或类似的东西.

Option 3 is the best by far. A little extra thought, and about one whole line of code, at the beginning of a project and so much wasted time and effort saved down the line. You don't have to jump on HDF5 or a similar heavyweight file design method, just adopt enough discipline to last the useful life of the contents of the file. For iteration-by-iteration dumps from your simulation of the universe, a home-brewed approach will do (be honest, you're the only person who's ever going to look at them). For data gathered at an approximate cost of $1M per TB (satellite observations, offshore seismic traces, etc) then HDF5 or something similar.

选项 1 也很好.您不必再等待磁带在两次读取之间倒带.(嗯,有些人会这样做,但现在它们处于一个小众市场,如果要使用文件,解档系统通常会将文件从磁带移动到磁盘.)

Option 1 is fine too. It's not like you have to wait for the tapes to rewind between reads any more. (Well, some do, but they're in a niche these days, and a de-archiving system will often move files from tape to disk if they're to be used.)

选项 2 很麻烦.它也可能是性能最差的,但除了最大的文件之外,最差的性能可能在最好的纳米世纪之内.如果这对您很重要,请查看.

Option 2 is a faff. It may also be the worst performing but on all but the largest files the worst performance may be within a nano-century of the best. If that's important to you then check it out.

如果您想量化我的意见,请在您硬件上的文件上运行您自己的实验.

If you want quantification of my opinions run your own experiments on your files on your hardware.

PS 我真的不知道获取 1TB 的卫星或地震数据要花多少钱,这是为了支持一个论点而发明的事实.

PS I haven't really got a clue how much it costs to get 1TB of satellite or seismic data, it's a factoid invented to support an argument.

这篇关于Fortran:读取未知大小的文件时如何分配数组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆