pandas ，Excel导入和MultiIndex [英] Pandas, Excel-Import and MultiIndex

查看：104 发布时间：2020/5/24 2:54:21 python pandas data-analysis

本文介绍了 pandas ，Excel导入和MultiIndex的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我是熊猫新手，目前正在尝试在以下架构中对Excel数据进行一些分析:

I am new to pandas and currently trying to make some analysis of Excel-data in the following Schema:

我的目标是一种可视化效果，其x轴上的索引标签为XYZ，CDE，EFG，HU，相应的Perc值为<在y轴上堆叠的c5>，ProbYes，X，ProbNo，No.

My goal is a visualisation with the index-labels XYZ, CDE, EFG, HU on the x-axis and the corresponing Perc-values of Yes, ProbYes, X, ProbNo, No stacked on the y-axis.

当前，我正在通过代码将Excel数据解析为熊猫DataFrame:

Currently I'm parsing the Excel-data into a panda DataFrame via the code:

import pandas as pd
path = 'x1.xlsx'
x = pd.ExcelFile(path)
sheets = x.sheet_names
table = x.parse(sheets[0], header=2) # take line 2 as column-names

由table生成的MultiIndex似乎很好:

>>> table.index 
MultiIndex(levels=[[u'Individual', u'Summary'], [u'ABC', u'CDE', u'EFG', u'HIJ'], [u'Abs', u'Perc']], labels=[[0, -1, -1, -1, -1, -1, -1, -1, 1, -1], [0, -1, 1, -1, 2, -1, 3, -1, -1, -1], [0, 1, 0, 1, 0, 1, 0, 1, 0, 1]])

但是，不可能访问特定的行:

However, it's neither possible to access a specific row:

>>> table.ix[('Individual', 'CDE')]
KeyError: 'Key length (2) was greater than MultiIndex lexsort depth (0)'

...也无法访问或过滤/删除包含行索引的列

... nor is it possible to access or filter / remove a column containing the row indices

>>> table.index.names
FrozenList([None, None, None])

即索引没有名称，所以我想我无法访问它们?

i.e. the indices have no names and so I guess I cannot access them?

我试图通过替代解析来简化数据结构:

I tried to simplify the data structure via the alternative parse:

>>> table2 = x.parse(sheets[0], header=2, skiprows=2, parse_cols='B,:I')

但是，它并没有真正的帮助.

that, however, didnt really help.

修改:

它对sort确实有帮助:

>>> table.sort(inplace=True)
>>> table[:4]

收益:

使用该表，我无法进行分析...

and with that table I cannot do my analysis...

推荐答案

我认为做完

table.sort(inplace=True)

您可能可以访问

table.ix[('Individual', 'CDE')]

我知道为什么-您的excel文件已经合并了A4:A11这样的单元格.将其加载到pandas DataFrame中时，索引Individual仅在A4中，而A5:A11中的索引都是nan

I know why -- Your excel file has merged cells such A4:A11. When you load it into pandas DataFrame, the index Individual is only in A4 while indices in A5:A11 are all nan

我能想到的一种解决方法是:

One work-around I can think of is:

table =table.reset_index().fillna(method='ffill').set_index(['level_0','level_1','level_2'])
#reset_index() automatically gives column names level_?

那你很好去:

table.ix[('Individual','CDE')]

这篇关于 pandas ，Excel导入和MultiIndex的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

pandas ，Excel导入和MultiIndex [英] Pandas, Excel-Import and MultiIndex

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

pandas ，Excel导入和MultiIndex [英] Pandas, Excel-Import and MultiIndex

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭