在 pandas 中解析多索引Excel文件 [英] Parsing a Multi-Index Excel File in Pandas

查看:50
本文介绍了在 pandas 中解析多索引Excel文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个时间序列excel文件,该文件具有一个三级列MultiIndex,如果可能的话,我想成功地对其进行解析.对于堆栈溢出的索引,如何执行此操作有一些结果,但没有列,并且 parse 函数具有的 header 似乎不包含行列表

I have a time series excel file with a tri-level column MultiIndex that I would like to successfully parse if possible. There are some results on how to do this for an index on stack overflow but not the columns and the parse function has a header that does not seem to take a list of rows.

ExcelFile如下所示:

The ExcelFile looks like is like the following:

  • A列是从A4开始的所有时间序列日期
  • 列B具有top_level1(B1)mid_level1(B2)low_level1(B3)数据(B4-B100 +)
  • 列C具有null(C1)null(C2)low_level2(C3)数据(C4-C100 +)
  • 列D具有空(D1)中级2(D2)低级1(D3)数据(D4-D100 +)
  • 列E具有null(E1)null(E2)low_level2(E3)数据(E4-E100 +)
  • ...

因此有两个 low_level 值,许多 mid_level 值和一些 top_level 值,但窍门是顶级和中级值为空并假定为左侧的值.因此,例如,上面的所有列都将top_level1作为顶部的多索引值.

So there are two low_level values many mid_level values and a few top_level values but the trick is the top and mid level values are null and are assumed to be the values to the left. So, for instance all the columns above would have top_level1 as the top multi-index value.

到目前为止,我最好的主意是使用 transpose ,但是它会在所有地方填充 Unnamed:#,并且似乎不起作用.在Pandas 0.13中, read_csv 似乎具有可以包含列表的 header 参数,但这似乎不适用于 parse .

My best idea so far is to use transpose, but the it fills Unnamed: # everywhere and doesn't seem to work. In Pandas 0.13 read_csv seems to have a header parameter that can take a list, but this doesn't seem to work with parse.

推荐答案

您可以 fillna 空值.我没有您的文件,但是您可以测试

You can fillna the null values. I don't have your file, but you can test

#Headers as rows for now
df = pd.read_excel(xls_file,0, header=None, index_col=0) 

#fill in Null values in "Headers"
df = df.fillna(method='ffill', axis=1) 

#create multiindex column names
df.columns=pd.MultiIndex.from_arrays(df[:3].values, names=['top','mid','low']) 

#Just name of index
df.index.name='Date' 

#remove 3 rows which are already used as column names
df = df[pd.notnull(df.index)] 

这篇关于在 pandas 中解析多索引Excel文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆