从多个Excel文件连接一个工作表,同时处理缺少工作表的文件 [英] Concat a single Sheet from Multiple Excel Files whilst handling files with missing sheets
问题描述
晚安,堆叠!
我面临与Python上的pandas库相关的问题.我试图在多个excel文件(具有多个工作表)上自动执行大量的添加/合并.但是,我不知道如何简单地跳过不包含该指定的sheet_name的文件.有任何想法吗?我的代码如下:
I am facing an issue related to pandas library on Python. Im trying to automate a mass append/concat on multiple excel files (with multiple sheets). However, i can not figure out how to simply skip the files which does not contains that especified sheet_name. Any ideas? My code is as below:
PS1:代码读取每个xlsx文件中的每张图纸后,我必须插入一个中断来结束迭代.
PS1: I had to insert a break to end the iteration as soon as the code was reading every sheet in every xlsx file.
PS2:错误是:"XLRDError:没有名为<'CSNSC 2020'>的工作表."
PS2: The error is: "XLRDError: No sheet named <'CSNSC 2020'>".
PS3:我能够找出一种方法:将一个try放置在循环的后面,并为该错误指定一个例外.但是,为此,我需要使其余的代码都能正常工作.
PS3: I was able to figure out a way: by positioning a try right after the loop for, and an exception for the error. However, for this, i need to be pretty the rest of the code works.
import pandas as pd
import os
path = r'C:/Users/Thiago/Desktop/Backup/Python/Files test append xlsx'
files = os.listdir(path)
df = pd.DataFrame()
xlsx_files = [path + '\\' + f for f in files if f[-4:] == 'xlsx']
for i in xlsx_files:
xlsx = pd.ExcelFile(i)
for name in xlsx.sheet_names:
data = pd.read_excel(i, header = 1, sheet_name = "CSNSC 2020")
data['File'] = i
print(i)
df = df.append(data)
break
df = df[['Dt. Ref.','Convênio','Tipo de Atendimento','Venc.']]
df.head()
df = df.dropna(subset=['Convênio'])
df.head()
df.to_excel(r'C:/Users/Thiago/Desktop/Backup/Python/Files test append xlsx/out.xlsx')
谢谢!
推荐答案
我编写了这个简单的函数来合并excel文件并处理缺少工作表的excel文件.随时根据自己的用例进行调整
I wrote this simple function to concat excel files and handle excel files with missing work-sheets. feel free to adapt it to your own use-case
主要要注意的是try
和except
处理错误.
the main thing to note is the try
and except
to handle the errors.
import pandas as pd
from pathlib import Path
from xlrd import XLRDError
行动中
concat_excels(src,'Sheet2',trg)
No sheet named <'Sheet2'> in file_0.xlsx, skipping
No sheet named <'Sheet2'> in file_1.xlsx, skipping
No sheet named <'Sheet2'> in file_2.xlsx, skipping
No sheet named <'Sheet2'> in file_3.xlsx, skipping
No sheet named <'Sheet2'> in file_4.xlsx, skipping
No sheet named <'Sheet2'> in file_5.xlsx, skipping
No sheet named <'Sheet2'> in file_6.xlsx, skipping
No sheet named <'Sheet2'> in file_7.xlsx, skipping
No sheet named <'Sheet2'> in file_8.xlsx, skipping
No sheet named <'Sheet2'> in file_9.xlsx, skipping
File Saved to C:\Users\DataNovice\OneDrive\Documents\2020\python\file_io_ops\move_files_test
功能.
def concat_excels(source_path, sheet_name, target_path):
"""
A simple script to find excel files in a target
location and merge them into a single file.
You need Python installed along with Pandas.
pathlib is available in Python 3.4 +
error handling added.
"""
# create list for excel files.
excel_files = [file for file in Path(source_path).glob("*.xlsx")]
# create empty list to store each individual dataframe.
excel_dataframe = []
# loop through our file to read each file and append it to our list.
for file in excel_files:
try:
df = pd.read_excel(file, sheet_name=sheet_name)
df.columns = df.columns.str.lower() # lowercase all columns
df.columns = (
df.columns.str.strip()
) # remove any trailing or leading white space.
excel_dataframe.append(df)
except XLRDError as err:
print(f"{err} in {file.name}, skipping")
try:
final_dataframe = pd.concat(excel_dataframe, axis=1)
final_dataframe.to_excel(target_path + "\master_file.xlsx", index=False)
print(f"File Saved to {target_path}")
except ValueError as err_2:
print(
f"No Sheets Matched in any of your excel files, are you sure {sheet_name} is correct?"
)
return excel_dataframe
这篇关于从多个Excel文件连接一个工作表,同时处理缺少工作表的文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!