在 pandas 中解析多索引Excel文件 [英] Parsing a Multi-Index Excel File in Pandas
问题描述
我有一个时间序列excel文件,该文件具有一个三级列MultiIndex,如果可能的话,我想成功地对其进行解析.对于堆栈溢出的索引,如何执行此操作有一些结果,但没有列,并且 parse
函数具有的 header
似乎不包含行列表
I have a time series excel file with a tri-level column MultiIndex that I would like to successfully parse if possible. There are some results on how to do this for an index on stack overflow but not the columns and the parse
function has a header
that does not seem to take a list of rows.
ExcelFile如下所示:
The ExcelFile looks like is like the following:
- A列是从A4开始的所有时间序列日期
- 列B具有top_level1(B1)mid_level1(B2)low_level1(B3)数据(B4-B100 +)
- 列C具有null(C1)null(C2)low_level2(C3)数据(C4-C100 +)
- 列D具有空(D1)中级2(D2)低级1(D3)数据(D4-D100 +)
- 列E具有null(E1)null(E2)low_level2(E3)数据(E4-E100 +)
- ...
因此有两个 low_level
值,许多 mid_level
值和一些 top_level
值,但窍门是顶级和中级值为空并假定为左侧的值.因此,例如,上面的所有列都将top_level1作为顶部的多索引值.
So there are two low_level
values many mid_level
values and a few top_level
values but the trick is the top and mid level values are null and are assumed to be the values to the left. So, for instance all the columns above would have top_level1 as the top multi-index value.
到目前为止,我最好的主意是使用 transpose
,但是它会在所有地方填充 Unnamed:#
,并且似乎不起作用.在Pandas 0.13中, read_csv
似乎具有可以包含列表的
parse
.>
My best idea so far is to use transpose
, but the it fills Unnamed: #
everywhere and doesn't seem to work. In Pandas 0.13 read_csv
seems to have a header
parameter that can take a list, but this doesn't seem to work with parse
.
推荐答案
您可以 fillna
空值.我没有您的文件,但是您可以测试
You can fillna
the null values. I don't have your file, but you can test
#Headers as rows for now
df = pd.read_excel(xls_file,0, header=None, index_col=0)
#fill in Null values in "Headers"
df = df.fillna(method='ffill', axis=1)
#create multiindex column names
df.columns=pd.MultiIndex.from_arrays(df[:3].values, names=['top','mid','low'])
#Just name of index
df.index.name='Date'
#remove 3 rows which are already used as column names
df = df[pd.notnull(df.index)]
这篇关于在 pandas 中解析多索引Excel文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!