在Python中读取Fortran二进制文件 [英] Reading Fortran binary file in Python

查看:325
本文介绍了在Python中读取Fortran二进制文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我无法在Python中读取未格式化的F77二进制文件. 我尝试了SciPy.io.FortraFile方法和NumPy.fromfile方法,但都无济于事.我还阅读了IDL中的文件,该文件可以正常工作,因此我对数据的外观有一个基准.我希望有人能指出我这一个愚蠢的错误-没有什么比白痴的时刻然后洗手要好了...

I'm having trouble reading an unformatted F77 binary file in Python. I've tried the SciPy.io.FortraFile method and the NumPy.fromfile method, both to no avail. I have also read the file in IDL, which works, so I have a benchmark for what the data should look like. I'm hoping that someone can point out a silly mistake on my part -- there's nothing better than having an idiot moment and then washing your hands of it...

数据bcube1的尺寸为101x101x101x3,为r * 8类型.共有3090903个条目.它们是使用以下语句编写的(不是我的代码,是从源代码复制的).

The data, bcube1, have dimensions 101x101x101x3, and is r*8 type. There are 3090903 entries in total. They are written using the following statement (not my code, copied from source).

open (unit=21, file=bendnm, status='new'
.     ,form='unformatted')
write (21) bcube1
close (unit=21)

我可以使用以下命令(也是我的代码,从同事那里复制的)在IDL中成功读取它:

I can successfully read it in IDL using the following (also not my code, copied from colleague):

bcube=dblarr(101,101,101,3)
openr,lun,'bcube.0000000',/get_lun,/f77_unformatted,/swap_if_little_endian
readu,lun,bcube
free_lun,lun

返回的数据(bcube)具有双精度,尺寸为101x101x101x3,因此文件的标题信息知道其尺寸(不是展平的).

The returned data (bcube) is double precision, with dimensions 101x101x101x3, so the header information for the file is aware of its dimensions (not flattend).

现在,我尝试使用Python获得相同的效果,但是没有运气.我尝试了以下方法.

Now I try to get the same effect using Python, but no luck. I've tried the following methods.

In [30]: f = scipy.io.FortranFile('bcube.0000000', header_dtype='uint32')
In [31]: b = f.read_record(dtype='float64')

,它将返回错误Size obtained (3092150529) is not a multiple of the dtypes given (8).更改dtype会更改获得的大小,但它仍然可以被8整除.

which returns the error Size obtained (3092150529) is not a multiple of the dtypes given (8). Changing the dtype changes the size obtained but it remains indivisible by 8.

或者,使用fromfile不会导致任何错误,但会返回数组中的另一个值(也许是页脚?),并且各个数组的值都非常错误(应该全部为1).

Alternately, using fromfile results in no errors but returns one more value that is in the array (a footer perhaps?) and the individual array values are wildly wrong (should all be of order unity).

In [38]: f = np.fromfile('bcube.0000000')
In [39]: f.shape
Out[39]: (3090904,)
In [42]: f
Out[42]: array([ -3.09179121e-030,   4.97284231e-020,  -1.06514594e+299, ...,
         8.97359707e-029,   6.79921640e-316,  -1.79102266e-037])

我尝试使用byteswap来查看这是否使浮点值更合理,但事实并非如此.

I've tried using byteswap to see if this makes the floating point values more reasonable but it does not.

在我看来,np.fromfile方法非常接近工作,但是读取标头信息的方式一定有问题.谁能建议我如何弄清楚头文件中应该包含哪些 ,以便IDL知道数组的维数和数据类型?有没有办法将标头信息传递给fromfile,以便它知道如何处理开头的条目?

It seems to me that the np.fromfile method is very close to working but there must be something wrong with the way it's reading the header information. Can anyone suggest how I can figure out what should be in the header file that allows IDL to know about the array dimensions and datatype? Is there a way to pass header information to fromfile so that it knows how to treat the leading entry?

推荐答案

我玩了一点,我想我有个主意.

I played a bit around with it, and I think I have an idea.

Fortran如何存储未格式化的数据尚未标准化,因此您必须对其进行一些尝试,但是您需要三项信息:

How Fortran stores unformatted data is not standardized, so you have to play a bit around with it, but you need three pieces of information:

  1. 数据格式.您建议这是64位实数,或者在python中为"f8".
  2. 标头的类型.这是一个无符号整数,但是您需要以字节为单位的长度.如果不确定,请尝试4.

  1. The Format of the data. You suggest that is 64-bit reals, or 'f8' in python.
  2. The type of the header. That is an unsigned integer, but you need the length in bytes. If unsure, try 4.

标头通常以字节为单位存储记录的长度,并在末尾重复.

The header usually stores the length of the record in bytes, and is repeated at the end.

再说一次,它不是标准化的,所以没有保证.

Then again, it is not standardized, so no guarantees.

字节序,大小不一.

从技术上讲,标头和值均适用,但我认为它们是相同的.

Technically for both header and values, but I assume they're the same.

Python默认为little endian,因此,如果这是您数据的正确设置,我想您已经解决了.

Python defaults to little endian, so if that were the the correct setting for your data, I think you would have already solved it.

使用scipy.io.FortranFile打开文件时,需要提供 header 的数据类型.因此,如果数据存储为big_endian,并且您有一个4字节的无符号整数标头,则需要这样做:

When you open the file with scipy.io.FortranFile, you need to give the data type of the header. So if the data is stored big_endian, and you have a 4-byte unsigned integer header, you need this:

from scipy.io import FortranFile
ff = FortranFile('data.dat', 'r', '>u4')

读取数据时,需要值的数据类型.同样,假设big_endian,您要输入>f8:

When you read the data, you need the data type of the values. Again, assuming big_endian, you want type >f8:

vals = ff.read_reals('>f8')

此处进行描述数据类型的语法.

Look here for a description of the syntax of the data type.

如果您对写数据的程序有控制权,强烈建议您将它们写到数据流中,Python可以更轻松地读取它们.

If you have control over the program that writes the data, I strongly suggest you write them into data streams, which can be more easily read by Python.

这篇关于在Python中读取Fortran二进制文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆