Python读取未格式化的直接访问Fortran 90会给出错误的输出 [英] Python reading unformatted direct access Fortran 90 gives incorrect output

查看:139
本文介绍了Python读取未格式化的直接访问Fortran 90会给出错误的输出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是数据的写入方式(它是浮点数的二维矩阵.我不确定大小).

Here is how the data is written (its a 2-D matrix of floats. I am not sure the size).

open(unit=51,file='rmsd/'//nn_output,form='unformatted',access='direct',status='replace',&
     recl=Npoints*sizeofreal)

!a bunch of code omitted 

    write(51,rec=idx-nstart+1) (real(dist(jdx)),jdx=1,Npoints)

这是我从以下类似问题的答案中尝试读取文件的方式: 2 .

Here is how I am trying to read the file, inspired from the answer to these similar questions: 1, 2.

   f = open(inputfilename,'rb')
   field = np.fromfile(f,dtype='float64')

但是结果不正确.矩阵的对角线应为0(或非常接近),因为它是相似矩阵.我尝试了不同的dtype,但是仍然无法获得正确的结果.

But the result is not correct. The diagonal of the matrix should be 0 (or very close) as it is a similarity matrix. I tried different dtypes but I still could not get a correct result.

这是我得到的输出

array([  1.17610188e+01,   2.45736970e+02,   7.79741823e+02, ...,
         9.52930627e+05,   8.93743127e+05,   7.64186127e+05])

我还没有为重塑而烦恼,因为该值的范围应该在0到〜20之间.

I haven't bothered with reshaping yet, as the values should range from 0 to ~20.

这是文件前100行的链接: https://drive.google.com/file/d/0B2Mz7CoRS5g5SmRxTUg5X19saGs/view?usp = sharing

EDIT 2: Here is a link to the first 100 lines of the file: https://drive.google.com/file/d/0B2Mz7CoRS5g5SmRxTUg5X19saGs/view?usp=sharing

文件顶部的声明

integer,parameter :: real_kind=8
!valuables for MPI
integer :: comm, nproc, myid, ierr

integer,allocatable :: idneigh(:)
real :: tmp
real,allocatable :: traj(:,:)
real(real_kind),allocatable :: dist(:)
real(real_kind),allocatable :: xx(:,:),yy(:,:),rot(:),weight(:)
character(200) :: nn_traj, nn_output, nn_neigh
integer,parameter :: sizeofreal=4,sizeofinteger=4

推荐答案

未格式化的Fortran二进制文件对记录长度进行编码以分隔记录.通常,记录长度将在每条记录之前和之后写入,尽管如果内存用于存储细节,则取决于处理器(如果未以这种方式分隔记录,请参见文章的后半部分).查看您发布的文件,如果将前4个字节解释为整数,而将其余字节解释为32位浮点值,则会得到:

Unformatted Fortran binary files encode record length to delimit the records. In general the record length will be written both before and after each record, though if memory serves the details of that are processor dependent (See the second half of the post if records are not delimited in this way). Looking at the file you posted, if you interpret the first 4 bytes as an integer and the remaining bytes as 32 bit floating point values you get:

0000000        881505604    7.302916e+00    8.723415e+00    6.914254e+00
0000020     9.826199e+00    7.044637e+00    8.601265e+00    6.629045e+00
0000040     6.103047e+00    9.476192e+00    9.326468e+00    6.535160e+00
0000060     8.904651e+00    4.710213e+00    6.534080e+00    1.156603e+01
0000100     1.046533e+01    9.343380e+00    8.574672e+00    7.498291e+00
0000120     1.071538e+01    7.138038e+00    5.898036e+00    6.182026e+00
0000140     7.037515e+00    6.418780e+00    6.294755e+00    8.327971e+00
0000160     6.796582e+00    7.397069e+00    6.493272e+00    1.126087e+01
0000200     6.467663e+00    7.178994e+00    7.867798e+00    5.921878e+00

如果您超过该记录长度字段,则可以将记录的其余部分读取到python变量中.您将需要指定正确的字节数,因为在记录的末尾将有另一个记录长度,并且该长度将作为错误值读入数组. 881505604表示您的NPoints是220376401(如果不正确,请参阅该文章的后半部分,这可能是数据,而不是记录长度).

If you seek past this record length field you can read the rest of the record into a python variable. You will want to specify the correct number of bytes because there will be another record length at the end of the record and that will read in as a wrong value into your array. 881505604 implies your NPoints is 220376401 (if this is not true, then see the second half of the post, this may be data and not a record length).

您可以使用以下命令在python中读取此数据:

You can read this data in python with:

f = open('fortran_output', 'rb')
recl = np.fromfile(f, dtype='int32', count=1)
f.seek(4)
field = np.fromfile(f, dtype='float32')

print('Record length=',recl)
print(field)

这将读取记录长度,向前搜索4个字节,并将文件的其余部分读取到float32数组中.如前所述,您将需要为读取指定一个正确的count=,以不占用结束记录字段.

This reads the record length, seeks 4 bytes forward and reads the rest of the file into a float32 array. You will want to, as mentioned before, specify a proper count= to the read to not intake the end record field.

该程序为您的输入文件输出以下内容:

This program outputs the following for your input file:

Record length= [881505604]
[ 7.30291557  8.72341537  6.91425419 ...,  6.4588294   6.53710747
  6.01582813]

数组中所有的32位实数都在0到20之间,因为您认为适合数据,所以这看起来像您想要的.

All of the 32 bit reals in the array are between 0 and 20 as you suggest is proper for your data, so this looks like what you want.

但是,如果您的编译器未在未格式化的直接访问文件的定界记录中对记录长度进行编码,则输出将被解释为32位浮点数:

If, however, your compiler does not encode record length in delimiting records for unformatted, direct access files, then the output is instead interpreted as 32 bit floats is:

0000000    2.583639e-07       7.3029156        8.723415        6.914254
0000020        9.826199        7.044637        8.601265        6.629045
0000040        6.103047       9.4761915        9.326468       6.5351596
0000060        8.904651        4.710213         6.53408       11.566033
0000100       10.465328         9.34338        8.574672        7.498291
0000120       10.715377        7.138038       5.8980355        6.182026
0000140       7.0375147         6.41878       6.2947555       8.3279705
0000160       6.7965817       7.3970685       6.4932723       11.260868
0000200        6.467663        7.178994        7.867798       5.9218783
0000220        6.710998         5.71757       6.1372333        5.809089

其中第一个值朝向零小得多,考虑到矩阵对角线应为0,这可能是合适的.

where the first value is much smaller toward zero, which may be appropriate given you say the matrix diagonal should be 0.

要读取此内容,可以尝试使用python进行读取,尽管如果文件中有多个记录,使用count=仅读取适当的记录长度是谨慎的做法.

To read this you can do the read in python just as you were attempting, though it would be prudent to use count= to only read in the proper record length if there is more than one record in the file.

使用python代码:

Using the python code:

f = open('fortran_output', 'rb')
field = np.fromfile(f, dtype='float32')

print(field)

产生输出

[  2.58363912e-07   7.30291557e+00   8.72341537e+00 ...,   6.45882940e+00
   6.53710747e+00   6.01582813e+00]

与匹配为32位浮点数的文件输出匹配.

which matches the file output interpreted as 32 bit floats.

在使用各种gfortran和ifort版本进行测试时,这种输出的记录的细节通常没有记录定界符,但可能像该帖子的前半部分或其他编译器有所不同.

The particulars of the writing are generally with no record delimiter for this kind of output when tested with various gfortran and ifort versions, but may be like the first half of the post or something different for other compilers.

我还想重申一个警告,即该解决方案并不是100%可信的,因为我们没有您写入该文件中的数据来进行验证.但是,您确实具有该信息,并且应该验证所生成的数据是否正确.使用默认标志进行编译时也应注意,ifort记录长度以4字节为单位,而gfortran则以1字节为单位.这不会在输出中引起问题,但如果在ifort中未对其进行补偿,则文件将比所需文件大4倍.您可以通过将-assume byterecl与ifort一起使用,以字节为单位获取记录长度.

I also want to re-iterate the caveat that this solution is not with 100% confidence since we do not have the data that you wrote into this file to verify. You do have that information though, and you should verify the data produced is proper. It is also worth noting when compiling with default flags, ifort record lengths are in 4 byte words, while gfortran is in 1 byte units. This won't cause a problem in output but if not compensated for in ifort, your files will be 4 times bigger than needed. You can get record lengths in byte units by using -assume byterecl with ifort.

这篇关于Python读取未格式化的直接访问Fortran 90会给出错误的输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆