大 pandas 组合Excel电子表格 [英] pandas Combine Excel Spreadsheets
问题描述
每个标签与所有其他选项卡具有相同的标题集。
我想将每个选项卡的所有数据合并到一个数据框架中(不重复每个选项卡的标题)。
到目前为止,尝试:
导入熊猫为pd
xl = pd.ExcelFile('file.xlsx')
df = xl.parse()
可以使用一些解析参数来表示所有电子表格 ?
或者这是错误的方法吗?
提前感谢!
更新:
a = xl.sheet_names
b = pd.DataFrame()
for i in a:
b.append(xl.parse(i))
b
但它不是工作。
这是一种方法 - 将所有表格加载到数据框的字典中,然后连接所有字典中的值转换为一个数据框。
将大熊猫导入为pd
将sheetname设置为None,以便将所有表格加载到数据框
中,忽略索引以避免稍后重叠的值(见@bunji的评论)
df = pd.read_excel('tmp.xlsx',sheetname = None,ignore_index = True)
然后连接所有数据框
cdf = pd.concat(df.values())
print(cdf)
I have an Excel workbook with many tabs. Each tab has the same set of headers as all others. I want to combine all of the data from each tab into one data frame (without repeating the headers for each tab).
So far, I've tried:
import pandas as pd
xl = pd.ExcelFile('file.xlsx')
df = xl.parse()
Can use something for the parse argument that will mean "all spreadsheets"? Or is this the wrong approach?
Thanks in advance!
Update: I tried:
a=xl.sheet_names
b = pd.DataFrame()
for i in a:
b.append(xl.parse(i))
b
But it's not "working".
This is one way to do it -- load all sheets into a dictionary of dataframes and then concatenate all the values in the dictionary into one dataframe.
import pandas as pd
Set sheetname to None in order to load all sheets into a dict of dataframes and ignore index to avoid overlapping values later (see comment by @bunji)
df = pd.read_excel('tmp.xlsx', sheetname=None, ignore_index=True)
Then concatenate all dataframes
cdf = pd.concat(df.values())
print(cdf)
这篇关于大 pandas 组合Excel电子表格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!