在 Pandas 中解析多索引 Excel 文件 [英] Parsing a Multi-Index Excel File in Pandas

查看:73
本文介绍了在 Pandas 中解析多索引 Excel 文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含三级列 MultiIndex 的时间序列 Excel 文件,如果可能的话,我想成功解析它.有一些关于如何为堆栈溢出索引而不是列执行此操作的结果,并且 parse 函数有一个 header 似乎没有获取行列表.

I have a time series excel file with a tri-level column MultiIndex that I would like to successfully parse if possible. There are some results on how to do this for an index on stack overflow but not the columns and the parse function has a header that does not seem to take a list of rows.

ExcelFile 如下所示:

The ExcelFile looks like is like the following:

  • A 列是从 A4 开始的所有时间序列日期
  • B 列有 top_level1 (B1) mid_level1 (B2) low_level1 (B3) 数据 (B4-B100+)
  • C 列有 null (C1) null (C2) low_level2 (C3) 数据 (C4-C100+)
  • D 列有空 (D1) mid_level2 (D2) low_level1 (D3) 数据 (D4-D100+)
  • E 列有 null (E1) null (E2) low_level2 (E3) 数据 (E4-E100+)
  • ...

所以有两个 low_level 值,许多 mid_level 值和几个 top_level 值,但诀窍是顶层和中级值是 null并假定为左边的值.因此,例如上面的所有列都将 top_level1 作为顶部多索引值.

So there are two low_level values many mid_level values and a few top_level values but the trick is the top and mid level values are null and are assumed to be the values to the left. So, for instance all the columns above would have top_level1 as the top multi-index value.

到目前为止,我最好的想法是使用 transpose,但它会在任何地方填充 Unnamed: # 并且似乎不起作用.在 Pandas 0.13 中,read_csv 似乎有一个可以获取列表的 header 参数,但这似乎不适用于 parse.

My best idea so far is to use transpose, but the it fills Unnamed: # everywhere and doesn't seem to work. In Pandas 0.13 read_csv seems to have a header parameter that can take a list, but this doesn't seem to work with parse.

推荐答案

你可以fillna空值.我没有你的文件,但你可以测试

You can fillna the null values. I don't have your file, but you can test

#Headers as rows for now
df = pd.read_excel(xls_file,0, header=None, index_col=0) 

#fill in Null values in "Headers"
df = df.fillna(method='ffill', axis=1) 

#create multiindex column names
df.columns=pd.MultiIndex.from_arrays(df[:3].values, names=['top','mid','low']) 

#Just name of index
df.index.name='Date' 

#remove 3 rows which are already used as column names
df = df[pd.notnull(df.index)] 

这篇关于在 Pandas 中解析多索引 Excel 文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆