在fortran中写入大型数组的最佳方法?文字与其他 [英] Best way to write a large array to file in fortran? Text vs Other
问题描述
我想知道将大型Fortran阵列(5000 x 5000个真实单精度数字)写入文件的最佳方法。我试图保存数值计算的结果供以后使用,因此不需要重复。从计算5000 x 5000 x 4字节每个数字是100 Mb,是否有可能将此保存为只有100Mb的形式?有没有办法将fortran数组保存为二进制文件并将其读回供以后使用?
我注意到,将数字保存到文本文件会生成比保存的数据类型大得多的文件。这是因为数字被保存为字符?
我熟悉写入文件的唯一方法是
open(unit = 41,file ='outfile.txt')
do i = 1,len
do j = 1,len
write(41 ,*)Array(i,j)
end do
end do
虽然我会想象有更好的方法来做到这一点。如果任何人都可以指点我一些资源或例子来批准我有效地编写和读取较大文件(在内存方面)的能力,那将是很棒的。
Thanks!
用二进制写数据文件,除非你真的要读输出 - 你不会读250万个元素的数组。
使用二进制的原因有三重,重要性降低: b
$ b
性能
数据量
可能是最明显的。当您将(二进制)浮点数转换为十进制数的字符串表示形式时,您不可避免地会在某个点处截断。没关系,如果你确定当你将文本值读回浮点值时,你肯定会得到相同的值;但这实际上是一个微妙的问题,需要仔细选择你的格式。使用默认格式,各种编译器以不同程度的质量执行此任务。 此博文 ,从游戏程序员的角度出发,在解决问题方面做得很好。
让我们考虑一个小程序,它针对各种格式写入一个单精度实数输出到一个字符串,然后再次读回来,记录它遇到的最大错误。我们将从0到1,以机器epsilon为单位。代码如下:
程序testaccuracy
字符(len = 128):: teststring
integer,parameter :: nformats = 4
character(len = 20),parameter :: formats(nformats)=&
['(E11.4)','(E13.6)','(E15.8)','(E17.10)']
real,dimension(nformats):: errors
real :: output,back
real,parameter :: delta = epsilon(输出)
integer :: i
errors = 0
output = 0
while(output <1)
do i = 1,nformats
write(teststring,FMT = formats(i))output
read(teststring ,*)返回
if(abs(back-output)> errors(i))errors(i)= abs(back-output)
enddo
output = output + delta
end do
print *,'最大错误:'
print *,格式
print *,错误
print *,'Trying with默认格式:'
errors = 0
output = 0
while(output <1)
write(teststring,*)output
read如果(abs(back-output)> errors(1))错误(1)= abs(反向输出)
,则返回
(teststring,*) output = output + delta
end do
print *,'Error =',errors(1)
end program testaccuracy
当我们运行它时,我们得到:
$ ./准确
最大误差:
(E11.4)(E13.6)(E15.8)(E17.10)
5.00082970E-05 5.06639481E- 07 7.45058060E-09 0.0000000
使用默认格式尝试:
错误= 7.45058060E-09
请注意,即使在小数点后面使用8位数字的格式 - 我们可能会认为这会很有用,因为单精度实数只能精确到6-7小数位 - 我们没有得到精确的副本,关闭大约1e-8。而这种编译器的默认格式不会给我们准确的往返浮点值;一些错误被引入!如果你是一个视频游戏程序员,那么这种精确度就足够了。但是,如果您正在进行时间相关的湍流流体模拟,那么可能绝对不行,尤其是在引入误差的位置存在某种偏差的情况下,或者错误发生在应该是保守量的情况下。 / p> 请注意,如果您尝试运行此代码,您会注意到完成时间需要很长的时间。这是因为,也许令人惊讶的是,性能是浮点数字文本输出的另一个实际问题。考虑下面这个简单的程序,它只是写出你的例子5000× 5000真实数组作为文本和未格式化的二进制: 以下是写入磁盘或虚拟磁盘的时序输出: 请注意,写入磁盘时,二进制输出为 352次与ASCII一样快,并且ramdisk更接近700次。有两个原因 - 一个是你可以一次写出所有数据,而不必循环;另一个是生成浮点数的字符串十进制表示是一个令人惊讶的微妙操作,它需要对每个值进行大量的计算。 最后,数据大小;上面例子中的文本文件出来了(在我的系统上)大约是二进制文件大小的4倍。 现在,二进制输出存在真正的问题。特别是,原始的Fortran(或者说C语言)的二进制输出非常脆弱。如果您更换平台,或者数据大小发生变化,您的输出结果可能不再有用。向输出添加新变量会破坏文件格式,除非您始终在文件末尾添加新数据,并且无法提前知道从协作者获得的二进制BLOB中的变量(可能是谁你,三个月前)。二进制输出的大多数缺点都可以通过使用像 NetCDF 这样的库来避免,它写自描述的二进制文件比原始二进制文件更未来的证明。更好的是,由于它是一个标准,因此许多工具都可以读取NetCDF文件。 互联网上有许多NetCDF教程;我们的此处。使用NetCDF的一个简单例子给出了与原始二进制文件类似的时间: 但是给你一个很好的自描述文件: 以及与原始二进制文件大小相同的文件大小: 代码如下: I wanted to know what the best way to write a large fortran array ( 5000 x 5000 real single precision numbers) to a file. I am trying to save the results of a numerical calculation for later use so they do not need to be repeated. From calculation 5000 x 5000 x 4bytes per number number is 100 Mb, is it possible to save this in a form that is only 100Mb? Is there a way to save fortran arrays as a binary file and read it back in for later use? I've noticed that saving numbers to a text file produces a file much larger than the size of the data type being saved. Is this because the numbers are being saved as characters? The only way I am familiar with to write to file is Although I'd imagine there is a better way to do it. If anyone could point me to some resources or examples to approve my ability to write and read larger files efficiently (in terms of memory) that would be great.
Thanks! Write data files in binary, unless you're going to actually be reading the output - and you're not going to be reading a 2.5 million-element array. The reasons for using binary are threefold, in decreasing importance: Accuracy concerns may be the most obvious. When you are converting a (binary) floating point number to a string representation of the decimal number, you are inevitably going to truncate at some point. That's ok if you are sure that when you read the text value back into a floating point value, you are certainly going to get the same value; but that is actually a subtle question and requires choosing your format carefully. Using default formatting, various compilers perform this task with varying degrees of quality. This blog post, written from the point of view of a games programmer, does a good job of covering the issues. Let's consider a little program which, for a variety of formats, writes a single-precision real number out to a string, and then reads it back in again, keeping track of the maximum error it encounters. We'll just go from 0 to 1, in units of machine epsilon. The code follows: and when we run it, we get: Note that even using a format with 8 digits after the decimal place - which we might think would be plenty, given that single precision reals are only accurate to 6-7 decimal places - we don't get exact copies back, off by approximately 1e-8. And this compiler's default format does not give us accurate round-trip floating point values; some error is introduced! If you're a video-game programmer, that level of accuracy may well be enough. If you're doing time-dependant simulations of turbulent fluids, however, that might absolutely not be ok, particularly if there's some bias to where the error is introduced, or if the error occurs in what is supposed to be a conserved quantity. Note that if you try running this code, you'll notice that it takes a surprisingly long time to finish. That's because, maybe surprisingly, performance is another real issue with text output of floating point numbers. Consider the following simple program, which just writes out your example of a 5000 × 5000 real array as text and as unformatted binary: Here are the timing outputs, for writing to disk or to ramdisk: Note that when writing to disk, the binary output is 352 times as fast as ASCII, and to ramdisk it's closer to 700 times. There are two reasons for this - one is that you can write out data all at once, rather than having to loop; the other is that generating the string decimal representation of a floating point number is a surprisingly subtle operation which requires a significant amount of computing for each value. Finally, is data size; the text file in the above example comes out (on my system) to about 4 times the size of the binary file. Now, there are real problems with binary output. In particular, raw Fortran (or, for that matter, C) binary output is very brittle. If you change platforms, or your data size changes, your output may no longer be any good. Adding new variables to the output will break the file format unless you always add new data at the end of the file, and you have no way of knowing ahead of time what variables are in a binary blob you get from your collaborator (who might be you, three months ago). Most of the downsides of binary output are avoided by using libraries like NetCDF, which write self-describing binary files that are much more "future proof" than raw binary. Better still, since it's a standard, many tools read NetCDF files. There are many NetCDF tutorials on the internet; ours is here. A simple example using NetCDF gives similar times to raw binary: but gives you a nice self-describing file: and file sizes about the same as raw binary: the code follows:
这篇关于在fortran中写入大型数组的最佳方法?文字与其他的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
程序testarray
隐式无
整型,参数: :asize = 5000
real,dimension(asize,asize):: array
integer :: i,j
integer :: time,u
$ b $ (i = 1:asize,j = 1:asize)array(i,j)= i * asize + j
call tick(time)
open(newunit = u, (数组(i,j),j = 1,asize)
enddo
close(u)
print *,'ASCII:time =',tock(time)
call tick(time)
open(newunit = u,file ='test .dat',form ='unformatted')
write(u)array
close(u)
print *,'Binary:time =',tock(time)
包含
子程序tick(t)
整数,意图(OUT):: t
调用system_clock(t)
结束子程序tick
!返回从现在到以秒为单位的时间,单位为t
实数函数tock(t)
整数,意图(in):: t
integer :: now,clock_rate
call system_clock (now,clock_rate)
tock = real(now -t)/ real(clock_rate)
结束函数tock
结束程序testarray
磁盘:
ASCII:time = 41.193001
二进制:time = 0.11700000
Ramdisk
ASCII:time = 40.789001
二进制:time = 5.70000000 E-02
$ $ $ ./array
ASCII:time = 40.676998
Binary:time = 4.30000015E-02
NetCDF:time = 0.16000000
$ ncdump -h test.nc
netcdf测试{
尺寸:
X = 5000;
Y = 5000;
变量:
float Array(Y,X);
Array:units =ergs;
}
$ du -sh test。*
96M test.dat
96M test.nc
382M test.txt
程序testarray
隐式无
整数,参数:: asize = 5000
实数,维(asize,asize)::数组
整数(i,j)= i * asize + j
call tick(time)
open(newunit = u,file ='test.txt')
do i = 1,asize
write(u,*) (array(i,j),j = 1,asize)
enddo
close(u)
print *,'ASCII:time =',tock(time)
call tick(time)
open(newunit = u,file ='test.dat',form ='unformatted')
write(u)array
close(u)
print *,'Binary:time =',tock(时间)
呼叫记号(时间)
呼叫writenetcdff ile(array)
print *,'NetCDF:time =',tock(time)
包含
子程序tick(t)
整数, intent(OUT):: t
调用system_clock(t)
结束子程序tick
!返回从现在到以秒为单位的时间,单位为t
实数函数tock(t)
整数,意图(in):: t
整数:: now,clock_rate
call system_clock (now,clock_rate)
tock = real(now -t)/ real(clock_rate)
结束函数tock
子程序writenetcdffile(数组)
使用netcdf
implicit none
real,intent(IN),dimension(:, :) :: array
整数:: file_id,xdim_id,ydim_id
整数:: array_id
integer,dimension(2):: arrdims
字符(len = *),parameter :: arrunit ='ergs'
整数:: i,j
整数: :ierr
i = size(array,1)
j = size(array,2)
!创建文件
ierr = nf90_create(path ='test.nc',cmode = NF90_CLOBBER,ncid = file_id)
!定义尺寸
ierr = nf90_def_dim(file_id,'X',i,xdim_id)
ierr = nf90_def_dim(file_id,'Y',j,ydim_id)
!现在已经定义了维度,我们可以在其上定义变量,...
arrdims =(/ xdim_id,ydim_id /)
ierr = nf90_def_var(file_id,'Array',NF90_REAL,arrdims,array_id)
! ...并为它们分配单位作为属性
ierr = nf90_put_att(file_id,array_id,units,arrunit)
!完成定义
ierr = nf90_enddef(file_id)
!写出值
ierr = nf90_put_var(file_id,array_id,array)
!关;完成
ierr = nf90_close(file_id)
返回
结束子程序writenetcdffile
结束程序testarray
open (unit=41, file='outfile.txt')
do i=1,len
do j=1,len
write(41,*) Array(i,j)
end do
end do
program testaccuracy
character(len=128) :: teststring
integer, parameter :: nformats=4
character(len=20), parameter :: formats(nformats) = &
[ '( E11.4)', '( E13.6)', '( E15.8)', '(E17.10)' ]
real, dimension(nformats) :: errors
real :: output, back
real, parameter :: delta=epsilon(output)
integer :: i
errors = 0
output = 0
do while (output < 1)
do i=1,nformats
write(teststring,FMT=formats(i)) output
read(teststring,*) back
if (abs(back-output) > errors(i)) errors(i) = abs(back-output)
enddo
output = output + delta
end do
print *, 'Maximum errors: '
print *, formats
print *, errors
print *, 'Trying with default format: '
errors = 0
output = 0
do while (output < 1)
write(teststring,*) output
read(teststring,*) back
if (abs(back-output) > errors(1)) errors(1) = abs(back-output)
output = output + delta
end do
print *, 'Error = ', errors(1)
end program testaccuracy
$ ./accuracy
Maximum errors:
( E11.4) ( E13.6) ( E15.8) (E17.10)
5.00082970E-05 5.06639481E-07 7.45058060E-09 0.0000000
Trying with default format:
Error = 7.45058060E-09
program testarray
implicit none
integer, parameter :: asize=5000
real, dimension(asize,asize) :: array
integer :: i, j
integer :: time, u
forall (i=1:asize, j=1:asize) array(i,j)=i*asize+j
call tick(time)
open(newunit=u,file='test.txt')
do i=1,asize
write(u,*) (array(i,j), j=1,asize)
enddo
close(u)
print *, 'ASCII: time = ', tock(time)
call tick(time)
open(newunit=u,file='test.dat',form='unformatted')
write(u) array
close(u)
print *, 'Binary: time = ', tock(time)
contains
subroutine tick(t)
integer, intent(OUT) :: t
call system_clock(t)
end subroutine tick
! returns time in seconds from now to time described by t
real function tock(t)
integer, intent(in) :: t
integer :: now, clock_rate
call system_clock(now,clock_rate)
tock = real(now - t)/real(clock_rate)
end function tock
end program testarray
Disk:
ASCII: time = 41.193001
Binary: time = 0.11700000
Ramdisk
ASCII: time = 40.789001
Binary: time = 5.70000000E-02
$ ./array
ASCII: time = 40.676998
Binary: time = 4.30000015E-02
NetCDF: time = 0.16000000
$ ncdump -h test.nc
netcdf test {
dimensions:
X = 5000 ;
Y = 5000 ;
variables:
float Array(Y, X) ;
Array:units = "ergs" ;
}
$ du -sh test.*
96M test.dat
96M test.nc
382M test.txt
program testarray
implicit none
integer, parameter :: asize=5000
real, dimension(asize,asize) :: array
integer :: i, j
integer :: time, u
forall (i=1:asize, j=1:asize) array(i,j)=i*asize+j
call tick(time)
open(newunit=u,file='test.txt')
do i=1,asize
write(u,*) (array(i,j), j=1,asize)
enddo
close(u)
print *, 'ASCII: time = ', tock(time)
call tick(time)
open(newunit=u,file='test.dat',form='unformatted')
write(u) array
close(u)
print *, 'Binary: time = ', tock(time)
call tick(time)
call writenetcdffile(array)
print *, 'NetCDF: time = ', tock(time)
contains
subroutine tick(t)
integer, intent(OUT) :: t
call system_clock(t)
end subroutine tick
! returns time in seconds from now to time described by t
real function tock(t)
integer, intent(in) :: t
integer :: now, clock_rate
call system_clock(now,clock_rate)
tock = real(now - t)/real(clock_rate)
end function tock
subroutine writenetcdffile(array)
use netcdf
implicit none
real, intent(IN), dimension(:,:) :: array
integer :: file_id, xdim_id, ydim_id
integer :: array_id
integer, dimension(2) :: arrdims
character(len=*), parameter :: arrunit = 'ergs'
integer :: i, j
integer :: ierr
i = size(array,1)
j = size(array,2)
! create the file
ierr = nf90_create(path='test.nc', cmode=NF90_CLOBBER, ncid=file_id)
! define the dimensions
ierr = nf90_def_dim(file_id, 'X', i, xdim_id)
ierr = nf90_def_dim(file_id, 'Y', j, ydim_id)
! now that the dimensions are defined, we can define variables on them,...
arrdims = (/ xdim_id, ydim_id /)
ierr = nf90_def_var(file_id, 'Array', NF90_REAL, arrdims, array_id)
! ...and assign units to them as an attribute
ierr = nf90_put_att(file_id, array_id, "units", arrunit)
! done defining
ierr = nf90_enddef(file_id)
! Write out the values
ierr = nf90_put_var(file_id, array_id, array)
! close; done
ierr = nf90_close(file_id)
return
end subroutine writenetcdffile
end program testarray