从python中的二进制文件中提取zlib压缩数据 [英] Extract zlib compressed data from binary file in python

查看:780
本文介绍了从python中的二进制文件中提取zlib压缩数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的公司使用传统文件格式存储电子照相术数据,该文件格式已不再生产.但是,对于保持回溯兼容性有一些兴趣,因此我正在研究为该文件格式编写阅读器的可能性.

My company uses a legacy file format for Electromiography data, which is no longer in production. However, there is some interest in maintaining retro-compatibility, so I am studying the possibility to write a reader for that file format.

通过分析非常复杂的用Delphi编写的先前源代码,文件读取器/写入器使用ZLIB,并且在HexEditor内部,看起来像是二进制ASCII格式的文件头(带有"Player","Analyzer"之类的字段)可读),然后是包含原始数据的压缩字符串.

By analyzing a very convoluted former source code written in Delp the file reader/writer uses ZLIB, and inside a HexEditor it looks like there is a file header in binary ASCII (with fields like "Player", "Analyzer" readily readable), followed by a compressed string containing raw data.

我的疑问是:我应该如何进行身份识别:

My doubt is: how should I proceed in order to identify:

  • 如果是压缩流;
  • 压缩流在哪里开始,在哪里结束;

来自维基百科:

zlib压缩数据通常使用gzip或zlib编写 包装纸.包装器通过添加以下内容封装原始的DEFLATE数据: 标题和尾部.这提供了流识别和错误 检测

zlib compressed data is typically written with a gzip or a zlib wrapper. The wrapper encapsulates the raw DEFLATE data by adding a header and trailer. This provides stream identification and error detection

这相关吗?

我很乐意发布更多信息,但是我不知道最相关的信息.

I'll be glad to post more information, but I don't know what would be most relevant.

感谢任何提示.

我有正在使用的应用程序,可以使用它来记录任何时间长度的实际数据,并在必要时获取小于1kB的文件.

I have the working application, and can use it to record actual data of any time length, getting files even smaller than 1kB if necessary.

一些示例文件:

一个新创建的文件,没有数据流: https://dl.dropbox. com/u/4849855/Mio_File/HeltonEmpty.mio

A freshly created one, without datastream: https://dl.dropbox.com/u/4849855/Mio_File/HeltonEmpty.mio

在保存了非常短(1秒?)的数据流之后,与上面相同:

The same as above after a very short (1 second?) datastream has been saved: https://dl.dropbox.com/u/4849855/Mio_File/HeltonFilled.mio

与名为"manco"而不是"Helton"的患者不同,它的流更短(适合十六进制查看):

A different one, from a patient named "manco" instead of "Helton", with an even shorter stream (ideal for Hex viewing): https://dl.dropbox.com/u/4849855/Mio_File/manco_short.mio

说明:每个文件都应该是患者(一个人)的文件.在这些文件中,保存了一个或多个检查,每个检查由一个或多个时间序列组成.提供的文件仅包含一个考试和一个数据系列.

Instructions: each file should be the file of a patient (a person). Inside these files, one or more exams are saved, each exam consisting of one or more time series. The provided files contain only one exam, with one data series.

推荐答案

首先,为什么不扫描文件以查找所有有效的zip流(对于小文件并确定其格式就足够了):

To start, why not scan the files for all valid zip streams (it's good enough for small files and to figure out the format):

import zlib
from glob import glob

def zipstreams(filename):
    """Return all zip streams and their positions in file."""
    with open(filename, 'rb') as fh:
        data = fh.read()
    i = 0
    while i < len(data):
        try:
            zo = zlib.decompressobj()
            yield i, zo.decompress(data[i:])
            i += len(data[i:]) - len(zo.unused_data)
        except zlib.error:
            i += 1

for filename in glob('*.mio'):
    print(filename)
    for i, data in zipstreams(filename):
        print (i, len(data))

数据流似乎包含小尾数双精度浮点数据:

Looks like the data streams contain little-endian double precision floating point data:

import numpy
from matplotlib import pyplot

for filename in glob('*.mio'):
    for i, data in zipstreams(filename):
        if data:
            a = numpy.fromstring(data, '<f8')
            pyplot.plot(a[1:])
            pyplot.title(filename + ' - %i' % i)
            pyplot.show()

这篇关于从python中的二进制文件中提取zlib压缩数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆