从Python中的文件中提取所需的字节 [英] Extract required bytes from a file in Python

查看:765
本文介绍了从Python中的文件中提取所需的字节的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个二进制文件在这里:
<一href=\"ftp://n5eil01u.ecs.nsidc.org/SAN/GLAS/GLA06.034/2003.02.21/GLA06_634_1102_001_0079_3_01_0001.DAT\" rel=\"nofollow\">ftp://n5eil01u.ecs.nsidc.org/SAN/GLAS/GLA06.034/2003.02.21/GLA06_634_1102_001_0079_3_01_0001.DAT

I have a binary file here: ftp://n5eil01u.ecs.nsidc.org/SAN/GLAS/GLA06.034/2003.02.21/GLA06_634_1102_001_0079_3_01_0001.DAT

我必须从该文件中提取以下数据:

I have to extract the following data from that file:

Byte Offset: 176 
Data type: 4-byte (long) integer
Total bytes: 160

我试过如下:

import numpy as np    
fname = 'GLA06_634_1102_001_0079_3_01_0001.DAT' 

with open(fname,'rb') as fi:
    fi.seek (176,0)
    data= np.fromfile(fi,dtype='long',count=160)
    print data

没有成功,有什么不对我的想法?

No success, what's wrong with my idea?

推荐答案

使用的硬codeD 的偏移量是相当的脆弱的解决方案。不过,假设你知道你在做什么:

Using a hard coded offset is a rather fragile solution. But assuming you know what you are doing:

Byte Offset: 176 
Data type: 4-byte (long) integer
Total bytes: 160

AKAICT,从而导致160/4 = 40 值读取(你能证实吗?)

AKAICT, that leads to 160/4 = 40 values to read (could you confirm that?)

在另外的类型应该是numpy的定义类型中的一个。在这里, np.int32 可能是正确的:

In addition, the type should be one of the numpy defined type. Here np.int32 might be the right one:

data= np.fromfile(fi,dtype=np.int32,count=40)

在我的电脑上,这将产生以下结果:

On my computer, this produces the following result:

[1919251297  997485633 1634494218 1936678771 1634885475  825124212
  808333629  808464432  942813232 1818692155 1868526433 1918854003
 1600484449 1702125924  842871086  758329392  841822768 1728723760
 1601397100 1600353135 1702125938 1835627615 1026633317  809119792
  808466992 1668483643 1668509535 1952543327 1026633317  960048688
  960051513  909654073  926037812 1668483643 1668509535 1952543327
 1633967973  825124212  808464957  842018099]


如果这不是什么预期,也许你有字节序的问题。

numpy的为<支持href=\"http://students.mimuw.edu.pl/~pbechler/numpy_doc/reference/arrays.dtypes.html#arrays-dtypes-constructin\"相对=nofollow>自定义类型来解决这个问题:

Numpy as support for custom defined types to solve that problem:

例如:


  • np.dtype('&LT; I4')为4个字节(符号)整数的小尾数

  • np.dtype('&GT; I4')为4个字节(符号)整数的大端

  • np.dtype('<i4') is 4 bytes (signed) integer little endian
  • np.dtype('>i4') is 4 bytes (signed) integer big endian

在你的情况下,强制读取数据作为小字节序,你可能会这样写:

In you case, to force reading data as little endian, you might write:

dt = np.dtype('<i4')

with open(fname,'rb') as fi:
    fi.seek (176,0)
    data= np.fromfile(fi,dtype=dt,count=40)
    print data

这篇关于从Python中的文件中提取所需的字节的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆