解压缩并读取Dukascopy .bi5刻度文件 [英] Decompress and read Dukascopy .bi5 tick files

查看：159 发布时间：2020/5/24 0:13:53 python csv pandas binary lzma

本文介绍了解压缩并读取Dukascopy .bi5刻度文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要打开一个.bi5文件并阅读内容，以缩短长篇幅.问题是:我有成千上万个.bi5文件，其中包含我需要解压缩和处理(读取，转储到熊猫中)的时序数据.

I need to open a .bi5 file and read the contents to cut a long story short. The problem: I have tens of thousands of .bi5 files containing time-series data that I need to decompress and process (read, dump into pandas).

我最终专门为lzma库安装了Python 3(我通常使用2.7)，因为我遇到了使用python 2.7的lzma反向端口进行编译的噩梦，所以我承认并使用Python 3，但没有成功.问题太多了，无法解决，没人读过冗长的问题！

I ended up installing Python 3 (I use 2.7 normally) specifically for the lzma library, as I ran into compiling nightmares using the lzma back-ports for Python 2.7, so I conceded and ran with Python 3, but with no success. The problems are too numerous to divulge, no one reads long questions!

我已经包含了.bi5个文件之一，如果有人可以设法将其放入Pandas Dataframe中并向我展示他们是如何做到的，那将是理想的选择.

I have included one of the .bi5 files, if someone could manage to get it into a Pandas Dataframe and show me how they did it, that would be ideal.

ps fie只有几kb，它将在一秒钟内下载.首先十分感谢.

ps the fie is only a few kb, it will download in a second. Thanks very much in advance.

(文件) http://www.filedropper.com/13hticks

推荐答案

下面的代码可以解决问题.首先，它打开一个文件并在 lzma 中对其进行解码，然后使用结构以解压缩二进制数据.

The code below should do the trick. First, it opens a file and decodes it in lzma and then uses struct to unpack the binary data.

import lzma
import struct
import pandas as pd


def bi5_to_df(filename, fmt):
    chunk_size = struct.calcsize(fmt)
    data = []
    with lzma.open(filename) as f:
        while True:
            chunk = f.read(chunk_size)
            if chunk:
                data.append(struct.unpack(fmt, chunk))
            else:
                break
    df = pd.DataFrame(data)
    return df

最重要的是知道正确的格式.我四处搜寻并尝试猜测'>3i2f'(或>3I2f)效果很好. (这是3位整数的大int 2浮点数.您的建议:'i4f'不会产生明智的浮点数-无论是大端还是小尾数.)有关struct和格式语法，请参见

The most important thing is to know the right format. I googled around and tried to guess and '>3i2f' (or >3I2f) works quite good. (It's big endian 3 ints 2 floats. What you suggest: 'i4f' doesn't produce sensible floats - regardless whether big or little endian.) For struct and format syntax see the docs.

df = bi5_to_df('13h_ticks.bi5', '>3i2f')
df.head()
Out[177]: 
      0       1       2     3     4
0   210  110218  110216  1.87  1.12
1   362  110219  110216  1.00  5.85
2   875  110220  110217  1.00  1.12
3  1408  110220  110218  1.50  1.00
4  1884  110221  110219  3.94  1.00

更新

要将bi5_to_df的输出与 https://github.com/ninety47/dukascopy 进行比较，我从那里编译并运行test_read_bi5.输出的第一行是:

To compare the output of bi5_to_df with https://github.com/ninety47/dukascopy, I compiled and run test_read_bi5 from there. The first lines of the output are:

time, bid, bid_vol, ask, ask_vol
2012-Dec-03 01:00:03.581000, 131.945, 1.5, 131.966, 1.5
2012-Dec-03 01:00:05.142000, 131.943, 1.5, 131.964, 1.5
2012-Dec-03 01:00:05.202000, 131.943, 1.5, 131.964, 2.25
2012-Dec-03 01:00:05.321000, 131.944, 1.5, 131.964, 1.5
2012-Dec-03 01:00:05.441000, 131.944, 1.5, 131.964, 1.5

在同一输入文件上的

和bi5_to_df给出:

And bi5_to_df on the same input file gives:

bi5_to_df('01h_ticks.bi5', '>3I2f').head()
Out[295]: 
      0       1       2     3    4
0  3581  131966  131945  1.50  1.5
1  5142  131964  131943  1.50  1.5
2  5202  131964  131943  2.25  1.5
3  5321  131964  131944  1.50  1.5
4  5441  131964  131944  1.50  1.5

所以一切似乎都很好(ninety47的代码对列进行了重新排序).

So everything seems to be fine (ninety47's code reorders columns).

此外，使用'>3I2f'代替'>3i2f'(即unsigned int代替int)可能更准确.

Also, it's probably more accurate to use '>3I2f' instead of '>3i2f' (i.e. unsigned int instead of int).

这篇关于解压缩并读取Dukascopy .bi5刻度文件的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

解压缩并读取Dukascopy .bi5刻度文件 [英] Decompress and read Dukascopy .bi5 tick files

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

解压缩并读取Dukascopy .bi5刻度文件 [英] Decompress and read Dukascopy .bi5 tick files

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭