遍历excel文件和工作表并在Python中连接 [英] Iterate through excel files and sheets and concatenate in Python

查看:178
本文介绍了遍历excel文件和工作表并在Python中连接的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

说我有一个文件夹,其中包含多个扩展名为xlsxxls的excel文件,它们共享相同的标题列a, b, c, d, e,除了几个文件中的空白表.

Say I have a folder which have multiple excel files with extension xlsx or xls, they share same header column a, b, c, d, e except some empty sheet in several files.

我要遍历所有文件和工作表(空工作表除外),并将它们连接到一个文件的一个工作表中output.xlsx.

I want to iterate all the files and sheets (except for empty sheets) and concatenate them into one sheet of one file output.xlsx.

我已经遍历了所有excel文件并将它们附加到一个文件中,但是如果每个文件有多个工作表,我又如何遍历每个文件的所有工作表呢?

I have iterated through all excel files and append them to one file, but how could I iterate through all the sheets of each files if they have more than one sheets?

我需要将下面的两个代码块集成为一个.感谢您的帮助.

I need to integrate two block of code below into one. Thanks for your help.

import pandas as pd
import numpy as np
import glob

path = os.getcwd()
files = os.listdir(path)
files

df = pd.DataFrame()

# method 1

excel_files = [f for f in files if f[-4:] == 'xlsx' or f[-3:] == 'xls']
excel_files

for f in excel_files:
    data = pd.read_excel(f)
    df = df.append(data)

# method 2

for f in glob.glob("*.xlsx" or "*.xls"):
    data = pd.read_excel(f)
    df = df.append(data, ignore_index=True)

# save the data frame
writer = pd.ExcelWriter('output.xlsx')
df.to_excel(writer, 'sheet1')
writer.save()

对于一个文件可以连接多张纸:

For one file to concatenate multiple sheets:

file = pd.ExcelFile('file.xlsx')

names = file.sheet_names  # read all sheet names

df = pd.concat([file.parse(name) for name in names])

推荐答案

import pandas as pd

path = os.getcwd()
files = os.listdir(path)
files

excel_files = [file for file in files if '.xls' in file]
excel_files

def create_df_from_excel(file_name):
    file = pd.ExcelFile(file_name)

    names = file.sheet_names

    return pd.concat([file.parse(name) for name in names])

df = pd.concat(
    [create_df_from_excel(xl) for xl in excel_files]
)

# save the data frame
writer = pd.ExcelWriter('output.xlsx')
df.to_excel(writer, 'sheet1')
writer.save()

这篇关于遍历excel文件和工作表并在Python中连接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆