商品期货分层数据结构 [英] Commodity Futures Hierarchical Data Structure

查看：85 发布时间：2020/11/23 5:08:24 python pandas hierarchical-data trading algorithmic-trading

本文介绍了商品期货分层数据结构的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我一生中似乎无法获得所需的结构并使它正常运行，因此我冒昧地来到你们这里.

I for the life of me cant seem to get the structure I want and have it function properly, so in a fit of rage I come to you guys.

设置: 我有一个名为Futures_Contracts的目录，里面有大约30个文件夹，所有文件夹都用基础资产命名，最后是6个以csv格式的最接近的到期合同.每个csv的格式都相同，并包含日期，O，H，L，C，V，OI，到期月.

Setup: I have a Directory called Futures_Contracts and inside is about 30 folders all named with the underlying asset, and finally inside the 6 nearest expiration contracts in csv format. Each csv is identical in format and contains Date,O,H,L,C,V,OI,Expiration Month.

注意:O H L C V OI是未平仓，高，低，收盘，成交量，未平仓头寸(对于那些不熟悉的人)也假定平仓是以下结算的代名词

Note: O H L C V OI is open, high, low, close, volume, open interest (for those not familiar) also assume close is synonymous with settlement below

任务:从这里开始，目标是将期货数据加载到多指数熊猫数据框中，以使顶级指数为基础商品符号，中级指数为到期年份，最后是OHLC数据.最终目标是拥有一些我可以开始在zipline模块上破解的东西，以使其在期货上运行. 因此，从视觉上看:

Task: From here the goal is to load in the futures data into a multi-index pandas dataframe in such a way that the top-level index is the underlying commodity symbol, the mid-level index is the expiration Month-Year, and finally the OHLC data. The end goal is to have something that I can start hacking at the zipline module to get it running on futures. So visually:

我的尝试:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from pandas import DataFrame, Series
import datetime
plt.figsize(16,8)

deliveries = {}
commoidities = {}
columns = 'open', 'high', 'low', 'settle', 'volume', 'interest', 'delivery' #Contract fields
path = os.getcwdu()+'/Futures_Contracts/' #Futures Path
for sym in os.listdir(path):
    if sym[0] != '.': #Weed out hidden files
        deliveries[sym] = []
        i = 0
        for contract in os.listdir(path + sym):
            temp = pd.io.parsers.read_csv(path + sym + '/' + contract, index_col=0, parse_dates = True, names = columns)#pull in the csv
            deliveries[sym].append(str(contract[:-4][-1] + contract[:-4][:-1][-2:])) #add contract to dict in form of MonthCode-YY
            commodities[sym] = deliveries[sym]
            commodities[sym][i] = temp
            i += 1

这在某种程度上是可行的，但是这实际上是一个嵌套的dict，它在最后保留一个数据帧.因此，切片非常笨拙:

This somewhat works, however this is really a nested dict that holds a dataframe at the end. Therefore slicing is extremely clunky:

commodities['SB2'][0]['settle'].plot()
commodities['SB2'][3]['settle'].plot()
commodities['SB2'][4]['settle'].plot()
commodities['SB2'][3]['settle'].plot()
commodities['SB2'][4]['settle'].plot()
commodities['SB2'][5]['settle'].plot()

并生成

理想情况下，我将可以对每个索引进行切片，以便可以比较资产，到期日，日期和值之间的数据.进一步标记我正在查看的内容，如您在matplotlib图表中看到的，所有内容都简单地命名为定居"

Optimally I will be able to slice across each of the indexes so that I can compare data across assets, expiration, date and value. Furthermore label what I am looking at, as you can see in the matplotlib chart everything is simply named 'settle'

肯定有一种方法可以做到这一点，但我还不够聪明，无法弄清楚.

There is surely a way to do this, but I am just not smart enough to figure it out.

推荐答案

我认为将其放入一个DataFrame会更好，因此请考虑使用MultiIndex.这是一个玩具示例，我认为它可以很好地转换为您的代码:

I think you're going to be much better off getting this into one DataFrame, so consider using a MultiIndex. Here's a toy example, which I think will translate well to your code:

In [11]: dfN13 = pd.DataFrame([[1, 2]], columns=[['N13', 'N13'], ['a', 'b']])

In [12]: dfM13 = pd.DataFrame([[3, 4]], columns=[['M13', 'M13'], ['a', 'b']])

这些是您示例中的DataFrame，但是列的第一级只是资产名称.

In [13]: df = pd.concat([dfN13, dfM13], axis=1)

In [14]: df
Out[14]:
   N13     M13
     a  b    a  b
0    1  2    3  4

为方便起见，我们可以标记列级别和索引.

For convenience we can label the columns-levels and index.

In [15]: df.columns.names = ['asset', 'chart']

In [16]: df.index.names = ['date']  # well, not in this toy example

In [17]: df
Out[17]:
asset  N13     M13
chart    a  b    a  b
date
0        1  2    3  4

注意:这看起来很像您的电子表格.

我们可以使用xs抓取特定图表(例如ohlc):

And we can grab out a specific chart (e.g. ohlc) using xs:

In [18]: df.xs('a', level='chart', axis=1)
Out[18]:
asset  N13  M13
date
0        1    3

In [19]: df.xs('a', level='chart', axis=1).plot()  # win

这篇关于商品期货分层数据结构的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

商品期货分层数据结构 [英] Commodity Futures Hierarchical Data Structure

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

商品期货分层数据结构 [英] Commodity Futures Hierarchical Data Structure

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭