读取具有不同数据类型的二进制文件 [英] Read binary file which has different datatypes

查看:132
本文介绍了读取具有不同数据类型的二进制文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

尝试将Fortran中生成的二进制文件读取到Python中,该文件具有一些整数,一些实数和逻辑.目前,我可以正确读取前几个数字:

x = np.fromfile(filein, dtype=np.int32, count=-1)
firstint= x[1]
...

(np是numpy). 但是下一项是合乎逻辑的.后来又在整数和实数之后.我该怎么办?

解决方案

通常,当您读取诸如此类的值时,它们处于常规模式(例如,类似C的结构数组).

另一种常见情况是各种值的简短标头,然后是一堆同类型的数据.

让我们先处理第一种情况.

按常规模式读取数据类型

例如,您可能会遇到类似这样的情况:

float, float, int, int, bool, float, float, int, int, bool, ...

在这种情况下,您可以定义dtype以匹配类型的模式.在上述情况下,它可能看起来像:

dtype=[('a', float), ('b', float), ('c', int), ('d', int), ('e', bool)]

(注意:有许多 定义dtype的方法.例如,您也可以将其写为np.dtype('f8,f8,i8,i8,?').请参见

Attempting to read a binary file produced in Fortran into Python, which has some integers, some reals and logicals. At the moment I read the first few numbers correctly with:

x = np.fromfile(filein, dtype=np.int32, count=-1)
firstint= x[1]
...

(np is numpy). But the next item is a logical. And later on ints again and after reals. How can I do it?

Typically, when you're reading in values such as this, they're in a regular pattern (e.g. an array of C-like structs).

Another common case is a short header of various values followed by a bunch of homogenously typed data.

Let's deal with the first case first.

Reading in Regular Patterns of Data Types

For example, you might have something like:

float, float, int, int, bool, float, float, int, int, bool, ...

If that's the case, you can define the a dtype to match the pattern of types. In the case above, it might look like:

dtype=[('a', float), ('b', float), ('c', int), ('d', int), ('e', bool)]

(Note: there are many different ways to define the dtype. For example, you could also write that as np.dtype('f8,f8,i8,i8,?'). See the documentation for numpy.dtype for more information.)

When you read your array in, it will be a structured array with named fields. You can later split it up into individual arrays if you'd prefer. (e.g. series1 = data['a'] with the dtype defined above)

The main advantage of this is that reading in your data from disk will be very fast. Numpy will simply read everything into memory, and then interpret the memory buffer according to the pattern you specified.

The drawback is that structured arrays behave a bit differently than regular arrays. If you're not used to them, they'll probably seem confusing at first. The key part to remember is that each item in the array is one of the patterns that you specified. For example, for what I showed above, data[0] might be something like (4.3, -1.2298, 200, 456, False).

Reading in a Header

Another common case is that you have a header with a know format and then a long series of regular data. You can still use np.fromfile for this, but you'll need to parse the header seperately.

First, read in the header. You can do this in several different ways (e.g. have a look at the struct module in addition to np.fromfile, though either will probably work well for your purposes).

After that, when you pass the file object to fromfile, the file's internal position (i.e. the position controlled by f.seek) will be at the end of the header and start of the data. If all of the rest of the file is a homogenously-typed array, a single call to np.fromfile(f, dtype) is all you need.

As a quick example, you might have something like the following:

import numpy as np

# Let's say we have a file with a 512 byte header, the 
# first 16 bytes of which are the width and height 
# stored as big-endian 64-bit integers.  The rest of the
# "main" data array is stored as little-endian 32-bit floats

with open('data.dat', 'r') as f:
    width, height = np.fromfile(f, dtype='>i8', count=2)
    # Seek to the end of the header and ignore the rest of it
    f.seek(512)
    data = np.fromfile(f, dtype=np.float32)

# Presumably we'd want to reshape the data into a 2D array:
data = data.reshape((height, width))

这篇关于读取具有不同数据类型的二进制文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆