从多个Excel文件连接一个工作表,同时处理缺少工作表的文件 [英] Concat a single Sheet from Multiple Excel Files whilst handling files with missing sheets

查看:106
本文介绍了从多个Excel文件连接一个工作表,同时处理缺少工作表的文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

晚安,堆叠!

我面临与Python上的pandas库相关的问题.我试图在多个excel文件(具有多个工作表)上自动执行大量的添加/合并.但是,我不知道如何简单地跳过不包含该指定的sheet_name的文件.有任何想法吗?我的代码如下:

I am facing an issue related to pandas library on Python. Im trying to automate a mass append/concat on multiple excel files (with multiple sheets). However, i can not figure out how to simply skip the files which does not contains that especified sheet_name. Any ideas? My code is as below:

PS1:代码读取每个xlsx文件中的每张图纸后,我必须插入一个中断来结束迭代.

PS1: I had to insert a break to end the iteration as soon as the code was reading every sheet in every xlsx file.

PS2:错误是:"XLRDError:没有名为<'CSNSC 2020'>的工作表."

PS2: The error is: "XLRDError: No sheet named <'CSNSC 2020'>".

PS3:我能够找出一种方法:将一个try放置在循环的后面,并为该错误指定一个例外.但是,为此,我需要使其余的代码都能正常工作.

PS3: I was able to figure out a way: by positioning a try right after the loop for, and an exception for the error. However, for this, i need to be pretty the rest of the code works.

    import pandas as pd
    import os

    path = r'C:/Users/Thiago/Desktop/Backup/Python/Files test append xlsx'
    files = os.listdir(path)

    df = pd.DataFrame()
    xlsx_files = [path + '\\' + f for f in files if f[-4:] == 'xlsx']

    for i in xlsx_files:
       xlsx = pd.ExcelFile(i)
          for name in xlsx.sheet_names:
             data = pd.read_excel(i, header = 1, sheet_name = "CSNSC 2020")
             data['File'] = i
             print(i)
             df = df.append(data)
             break

    df = df[['Dt. Ref.','Convênio','Tipo de Atendimento','Venc.']]
    df.head()

    df = df.dropna(subset=['Convênio'])
    df.head()

    df.to_excel(r'C:/Users/Thiago/Desktop/Backup/Python/Files test append xlsx/out.xlsx')

谢谢!

推荐答案

我编写了这个简单的函数来合并excel文件并处理缺少工作表的excel文件.随时根据自己的用例进行调整

I wrote this simple function to concat excel files and handle excel files with missing work-sheets. feel free to adapt it to your own use-case

主要要注意的是tryexcept处理错误.

the main thing to note is the try and except to handle the errors.

import pandas as pd
from pathlib import Path
from xlrd import XLRDError

行动中

concat_excels(src,'Sheet2',trg)
No sheet named <'Sheet2'> in file_0.xlsx, skipping
No sheet named <'Sheet2'> in file_1.xlsx, skipping
No sheet named <'Sheet2'> in file_2.xlsx, skipping
No sheet named <'Sheet2'> in file_3.xlsx, skipping
No sheet named <'Sheet2'> in file_4.xlsx, skipping
No sheet named <'Sheet2'> in file_5.xlsx, skipping
No sheet named <'Sheet2'> in file_6.xlsx, skipping
No sheet named <'Sheet2'> in file_7.xlsx, skipping
No sheet named <'Sheet2'> in file_8.xlsx, skipping
No sheet named <'Sheet2'> in file_9.xlsx, skipping
File Saved to C:\Users\DataNovice\OneDrive\Documents\2020\python\file_io_ops\move_files_test

功能.

def concat_excels(source_path, sheet_name, target_path):

    """ 
    A simple script to find excel files in a target 
    location and merge them into a single file.
    You need Python installed along with Pandas.
    pathlib is available in Python 3.4 + 
    error handling added.
    """

    # create list for excel files.
    excel_files = [file for file in Path(source_path).glob("*.xlsx")]

    # create empty list to store each individual dataframe.
    excel_dataframe = []

    # loop through our file to read each file and append it to our list.

    for file in excel_files:
        try:
            df = pd.read_excel(file, sheet_name=sheet_name)
            df.columns = df.columns.str.lower()  # lowercase all columns
            df.columns = (
                df.columns.str.strip()
            )  # remove any trailing or leading white space.
            excel_dataframe.append(df)
        except XLRDError as err:
            print(f"{err} in {file.name}, skipping")

    try:
        final_dataframe = pd.concat(excel_dataframe, axis=1)
        final_dataframe.to_excel(target_path + "\master_file.xlsx", index=False)

        print(f"File Saved to {target_path}")

    except ValueError as err_2:
        print(
            f"No Sheets Matched in any of your excel files, are you sure {sheet_name} is correct?"
        )
    return excel_dataframe

这篇关于从多个Excel文件连接一个工作表,同时处理缺少工作表的文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆