pandas ,Excel导入和MultiIndex [英] Pandas, Excel-Import and MultiIndex
问题描述
我是熊猫新手,目前正在尝试在以下架构中对Excel数据进行一些分析:
I am new to pandas and currently trying to make some analysis of Excel-data in the following Schema:
我的目标是一种可视化效果,其x轴上的索引标签为XYZ
,CDE
,EFG
,HU
,相应的Perc
值为<在y轴上堆叠的c5>,ProbYes
,X
,ProbNo
,No
.
My goal is a visualisation with the index-labels XYZ
, CDE
, EFG
, HU
on the x-axis and the corresponing Perc
-values of Yes
, ProbYes
, X
, ProbNo
, No
stacked on the y-axis.
当前,我正在通过代码将Excel数据解析为熊猫DataFrame
:
Currently I'm parsing the Excel-data into a panda DataFrame
via the code:
import pandas as pd
path = 'x1.xlsx'
x = pd.ExcelFile(path)
sheets = x.sheet_names
table = x.parse(sheets[0], header=2) # take line 2 as column-names
由table
生成的MultiIndex
似乎很好:
>>> table.index
MultiIndex(levels=[[u'Individual', u'Summary'], [u'ABC', u'CDE', u'EFG', u'HIJ'], [u'Abs', u'Perc']], labels=[[0, -1, -1, -1, -1, -1, -1, -1, 1, -1], [0, -1, 1, -1, 2, -1, 3, -1, -1, -1], [0, 1, 0, 1, 0, 1, 0, 1, 0, 1]])
但是,不可能访问特定的行:
However, it's neither possible to access a specific row:
>>> table.ix[('Individual', 'CDE')]
KeyError: 'Key length (2) was greater than MultiIndex lexsort depth (0)'
...也无法访问或过滤/删除包含行索引的列
... nor is it possible to access or filter / remove a column containing the row indices
>>> table.index.names
FrozenList([None, None, None])
即索引没有名称,所以我想我无法访问它们?
i.e. the indices have no names and so I guess I cannot access them?
我试图通过替代解析来简化数据结构:
I tried to simplify the data structure via the alternative parse:
>>> table2 = x.parse(sheets[0], header=2, skiprows=2, parse_cols='B,:I')
但是,它并没有真正的帮助.
that, however, didnt really help.
修改:
它对sort
确实有帮助:
>>> table.sort(inplace=True)
>>> table[:4]
收益:
使用该表,我无法进行分析...
and with that table I cannot do my analysis...
推荐答案
我认为做完
table.sort(inplace=True)
您可能可以访问
table.ix[('Individual', 'CDE')]
我知道为什么-您的excel文件已经合并了A4:A11
这样的单元格.将其加载到pandas DataFrame中时,索引Individual
仅在A4
中,而A5:A11
中的索引都是nan
I know why -- Your excel file has merged cells such A4:A11
. When you load it into pandas DataFrame, the index Individual
is only in A4
while indices in A5:A11
are all nan
我能想到的一种解决方法是:
One work-around I can think of is:
table =table.reset_index().fillna(method='ffill').set_index(['level_0','level_1','level_2'])
#reset_index() automatically gives column names level_?
那你很好去:
table.ix[('Individual','CDE')]
这篇关于 pandas ,Excel导入和MultiIndex的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!