将 Pandas DataFrame 写入 Excel:如何自动调整列宽 [英] Writing Pandas DataFrame to Excel: How to auto-adjust column widths
问题描述
我正在尝试将一系列 Pandas DataFrame 写入 Excel 工作表,以便:
- 工作表的现有内容不会被覆盖或删除,并且
- 调整 Excel 列宽以适合列条目的长度(这样我就不必在 Excel 中手动执行此操作).
对于 1),我以@MaxU 编写的辅助函数的形式找到了一个很好的解决方案:
奇怪的是,当我尝试第二次执行该函数(不更改参数)时,出现错误:
runcell(2, 'C:/Users/Leonidas/Documents/write_to_excel2.py')回溯(最近一次调用最后一次):文件C:\Users\Leonidas\Documents\write_to_excel2.py",第 125 行,在 <module> 中.append_df_to_excel(C:/Users/Leonidas/Documents/test.xlsx", df,文件C:\Users\Leonidas\Documents\write_to_excel2.py",第 100 行,在 append_df_to_excel 中writer.sheets[sheet_name].set_column(col_idx, col_idx, column_width)AttributeError: 'Worksheet' 对象没有属性 'set_column'
此时我很困惑...任何有关如何修复代码的建议将不胜感激.
尝试使用这个辅助函数:
将 numpy 导入为 np将熊猫导入为 pd从 pathlib 导入路径from 输入 import Union, Optional, List, Tuple从 openpyxl 导入 load_workbook从 openpyxl.utils 导入 get_column_letterdef append_df_to_excel(文件名:联合[str,路径],df: pd.DataFrame,sheet_name: str = 'Sheet1',开始:int = 无,max_col_width: int = 40,自动过滤器:布尔 = 假,fmt_int: str = "#,##0",fmt_float: str = "#,##0.00",fmt_date: str = "yyyy-mm-dd",fmt_datetime: str = "yyyy-mm-dd hh:mm",truncate_sheet: bool = False,**to_excel_kwargs) ->没有任何:"将数据帧 [df] 附加到现有 Excel 文件 [文件名]进入 [sheet_name] 工作表.如果 [filename] 不存在,则此函数将创建它.@param 文件名:文件路径或现有 ExcelWriter(例如:'/path/to/file.xlsx')@param df:要保存到工作簿的数据帧@param sheet_name:将包含数据帧的工作表的名称.(默认值:'Sheet1')@param startrow:左上角单元格行以转储数据帧.默认情况下 (startrow=None) 计算最后一行在现有 DF 中并写入下一行...@param max_col_width:Excel 中的最大列宽.默认值:30@param autofilter: boolean - 是否添加 Excel 自动过滤器.默认值:真@param fmt_int:整数的 Excel 格式@param fmt_float:浮点数的 Excel 格式@param fmt_date:日期的 Excel 格式@param fmt_datetime:日期时间的 Excel 格式@param truncate_sheet:截断(删除并重新创建)[sheet_name]在将 DataFrame 写入 Excel 文件之前@param to_excel_kwargs:将传递给`DataFrame.to_excel()`的参数【可以是字典】@return:无用法示例:>>>append_df_to_excel('d:/temp/test.xlsx', df, autofilter=True,冻结窗格=(1,0))>>>append_df_to_excel('d:/temp/test.xlsx', df, header=None, index=False)>>>append_df_to_excel('d:/temp/test.xlsx', df, sheet_name='Sheet2',指数=假)>>>append_df_to_excel('d:/temp/test.xlsx', df, sheet_name='Sheet2',索引=假,起始行=25)>>>append_df_to_excel('d:/temp/test.xlsx', df, index=False,fmt_datetime=dd.mm.yyyy hh:mm")(c) [MaxU](https://stackoverflow.com/users/5741205/maxu?tab=profile)"def set_column_format(ws, column_letter, fmt):对于 ws[column_letter] 中的单元格:cell.number_format = fmt文件名 = 路径(文件名)file_exists = 文件名.is_file()# 工艺参数first_col = int(to_excel_kwargs.get(index", True)) + 1sheet_name = to_excel_kwargs.get("sheet_name", "Sheet1")# 忽略 [engine] 参数,如果它被传递如果 to_excel_kwargs 中的引擎":to_excel_kwargs.pop('引擎')与 pd.ExcelWriter(文件名.with_suffix(.xlsx"),引擎=openpyxl",模式=a"如果 file_exists else "w",date_format=fmt_date,datetime_format=fmt_datetime,**to_excel_kwargs) 作为作家:如果文件存在:# 尝试打开一个现有的工作簿writer.book = load_workbook(文件名)# 获取现有 Excel 工作表中的最后一行# 如果没有明确指定如果 startrow 在 writer.book.sheetnames 中为 None 和 sheet_name:startrow = writer.book[sheet_name].max_row# 截断工作表如果在 writer.book.sheetnames 中 truncate_sheet 和 sheet_name:# [sheet_name] 工作表的索引idx = writer.book.sheetnames.index(sheet_name)# 删除 [sheet_name]writer.book.remove(writer.book.worksheets[idx])# 使用旧索引创建一个空表 [sheet_name]writer.book.create_sheet(sheet_name, idx)# 复制现有的工作表writer.sheets = {ws.title:ws for ws in writer.book.worksheets}别的:# 文件不存在,我们正在创建一个新的起始行 = 0# 将 DataFrame 写出到 ExcelWriterdf.to_excel(writer, sheet_name=sheet_name, startrow=startrow,**to_excel_kwargs)# 自动设置列宽工作表 = writer.sheets[sheet_name]对于 xl_col_no, enumerate(df.dtypes, first_col) 中的 dtyp:col_no = xl_col_no - first_colwidth = max(df.iloc[:, col_no].astype(str).str.len().max(),len(df.columns[col_no]) + 6)宽度 = min(max_col_width, 宽度)# print(f"column: [{df.columns[col_no]} ({dtyp.name})]\twidth:\t[{width}]")column_letter = get_column_letter(xl_col_no)worksheet.column_dimensions[column_letter].width = 宽度如果 np.issubdtype(dtyp, np.integer):set_column_format(工作表,column_letter,fmt_int)如果 np.issubdtype(dtyp, np.floating):set_column_format(工作表,column_letter,fmt_float)如果自动过滤:worksheet.auto_filter.ref = worksheet.dimensions
I am trying to write a series of pandas DataFrames to an Excel worksheet such that:
- The existing contents of the worksheet are not overwritten or erased, and
- the Excel column widths are adjusted to fit the lengths of the column entries (so that I don't have to manually do this in Excel).
For 1), I have found an excellent solution in the form of a helper function written by @MaxU: How to write to an existing excel file without overwriting data (using pandas)?. For 2) I found what looked like a good solution here. But when I try to put these solutions together, the column widths don't change at all. Here's my full code:
import pandas as pd
import os
from openpyxl import load_workbook
def append_df_to_excel(filename, df, sheet_name='Sheet1', startrow=None,
truncate_sheet=False,
**to_excel_kwargs):
"""
Append a DataFrame [df] to existing Excel file [filename]
into [sheet_name] Sheet.
If [filename] doesn't exist, then this function will create it.
@param filename: File path or existing ExcelWriter
(Example: '/path/to/file.xlsx')
@param df: DataFrame to save to workbook
@param sheet_name: Name of sheet which will contain DataFrame.
(default: 'Sheet1')
@param startrow: upper left cell row to dump data frame.
Per default (startrow=None) calculate the last row
in the existing DF and write to the next row...
@param truncate_sheet: truncate (remove and recreate) [sheet_name]
before writing DataFrame to Excel file
@param to_excel_kwargs: arguments which will be passed to `DataFrame.to_excel()`
[can be a dictionary]
@return: None
Usage examples:
>>> append_df_to_excel('d:/temp/test.xlsx', df)
>>> append_df_to_excel('d:/temp/test.xlsx', df, header=None, index=False)
>>> append_df_to_excel('d:/temp/test.xlsx', df, sheet_name='Sheet2',
index=False)
>>> append_df_to_excel('d:/temp/test.xlsx', df, sheet_name='Sheet2',
index=False, startrow=25)
(c) [MaxU](https://stackoverflow.com/users/5741205/maxu?tab=profile)
"""
# Excel file doesn't exist - saving and exiting
if not os.path.isfile(filename):
df.to_excel(
filename,
sheet_name=sheet_name,
startrow=startrow if startrow is not None else 0,
**to_excel_kwargs)
return
# ignore [engine] parameter if it was passed
if 'engine' in to_excel_kwargs:
to_excel_kwargs.pop('engine')
writer = pd.ExcelWriter(filename, engine='openpyxl', mode='a')
# try to open an existing workbook
writer.book = load_workbook(filename)
# get the last row in the existing Excel sheet
# if it was not specified explicitly
if startrow is None and sheet_name in writer.book.sheetnames:
startrow = writer.book[sheet_name].max_row
# truncate sheet
if truncate_sheet and sheet_name in writer.book.sheetnames:
# index of [sheet_name] sheet
idx = writer.book.sheetnames.index(sheet_name)
# remove [sheet_name]
writer.book.remove(writer.book.worksheets[idx])
# create an empty sheet [sheet_name] using old index
writer.book.create_sheet(sheet_name, idx)
# copy existing sheets
writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
if startrow is None:
startrow = 0
# write out the new sheet
df.to_excel(writer, sheet_name, startrow=startrow, **to_excel_kwargs)
"""
Now attempt to adjust the column widths as necessary so that all the cell contents are visible
in Excel. The code below is taken from https://towardsdatascience.com/how-to-auto-adjust-the-width-of-excel-columns-with-pandas-excelwriter-60cee36e175e.
"""
for column in df:
column_width = max(df[column].astype(str).map(len).max(), len(column))
col_idx = df.columns.get_loc(column)
writer.sheets[sheet_name].set_column(col_idx, col_idx, column_width)
writer.save()
Now I tried testing the function:
df = pd.DataFrame({'A_Very_Long_Column_Name': [10, 20, 30, 20, 15, 30, 45]})
append_df_to_excel("C:/Users/Leonidas/Documents/test.xlsx", df, "Sheet1")
A new Excel workbook named test.xlsx is created along with a sheet named Sheet1, and the contents of df
are written to Sheet1, but the column widths are completely unaffected:
And strangely, when I try to execute the function a second time (without changing the arguments), I get an error:
runcell(2, 'C:/Users/Leonidas/Documents/write_to_excel2.py')
Traceback (most recent call last):
File "C:\Users\Leonidas\Documents\write_to_excel2.py", line 125, in <module>
append_df_to_excel("C:/Users/Leonidas/Documents/test.xlsx", df,
File "C:\Users\Leonidas\Documents\write_to_excel2.py", line 100, in append_df_to_excel
writer.sheets[sheet_name].set_column(col_idx, col_idx, column_width)
AttributeError: 'Worksheet' object has no attribute 'set_column'
I'm pretty confused at this point...Any suggestions for how to fix the code would be greatly appreciated.
Try to use this helper function:
import numpy as np
import pandas as pd
from pathlib import Path
from typing import Union, Optional, List, Tuple
from openpyxl import load_workbook
from openpyxl.utils import get_column_letter
def append_df_to_excel(
filename: Union[str, Path],
df: pd.DataFrame,
sheet_name: str = 'Sheet1',
startrow: int = None,
max_col_width: int = 40,
autofilter: bool = False,
fmt_int: str = "#,##0",
fmt_float: str = "#,##0.00",
fmt_date: str = "yyyy-mm-dd",
fmt_datetime: str = "yyyy-mm-dd hh:mm",
truncate_sheet: bool = False,
**to_excel_kwargs
) -> None:
"""
Append a DataFrame [df] to existing Excel file [filename]
into [sheet_name] Sheet.
If [filename] doesn't exist, then this function will create it.
@param filename: File path or existing ExcelWriter
(Example: '/path/to/file.xlsx')
@param df: DataFrame to save to workbook
@param sheet_name: Name of sheet which will contain DataFrame.
(default: 'Sheet1')
@param startrow: upper left cell row to dump data frame.
Per default (startrow=None) calculate the last row
in the existing DF and write to the next row...
@param max_col_width: maximum column width in Excel. Default: 30
@param autofilter: boolean - whether add Excel autofilter or not. Default: True
@param fmt_int: Excel format for integer numbers
@param fmt_float: Excel format for float numbers
@param fmt_date: Excel format for dates
@param fmt_datetime: Excel format for datetime's
@param truncate_sheet: truncate (remove and recreate) [sheet_name]
before writing DataFrame to Excel file
@param to_excel_kwargs: arguments which will be passed to `DataFrame.to_excel()`
[can be a dictionary]
@return: None
Usage examples:
>>> append_df_to_excel('d:/temp/test.xlsx', df, autofilter=True,
freeze_panes=(1,0))
>>> append_df_to_excel('d:/temp/test.xlsx', df, header=None, index=False)
>>> append_df_to_excel('d:/temp/test.xlsx', df, sheet_name='Sheet2',
index=False)
>>> append_df_to_excel('d:/temp/test.xlsx', df, sheet_name='Sheet2',
index=False, startrow=25)
>>> append_df_to_excel('d:/temp/test.xlsx', df, index=False,
fmt_datetime="dd.mm.yyyy hh:mm")
(c) [MaxU](https://stackoverflow.com/users/5741205/maxu?tab=profile)
"""
def set_column_format(ws, column_letter, fmt):
for cell in ws[column_letter]:
cell.number_format = fmt
filename = Path(filename)
file_exists = filename.is_file()
# process parameters
first_col = int(to_excel_kwargs.get("index", True)) + 1
sheet_name = to_excel_kwargs.get("sheet_name", "Sheet1")
# ignore [engine] parameter if it was passed
if 'engine' in to_excel_kwargs:
to_excel_kwargs.pop('engine')
with pd.ExcelWriter(
filename.with_suffix(".xlsx"),
engine="openpyxl",
mode="a" if file_exists else "w",
date_format=fmt_date,
datetime_format=fmt_datetime,
**to_excel_kwargs
) as writer:
if file_exists:
# try to open an existing workbook
writer.book = load_workbook(filename)
# get the last row in the existing Excel sheet
# if it was not specified explicitly
if startrow is None and sheet_name in writer.book.sheetnames:
startrow = writer.book[sheet_name].max_row
# truncate sheet
if truncate_sheet and sheet_name in writer.book.sheetnames:
# index of [sheet_name] sheet
idx = writer.book.sheetnames.index(sheet_name)
# remove [sheet_name]
writer.book.remove(writer.book.worksheets[idx])
# create an empty sheet [sheet_name] using old index
writer.book.create_sheet(sheet_name, idx)
# copy existing sheets
writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
else:
# file doesn't exist, we are creating a new one
startrow = 0
# write out the DataFrame to an ExcelWriter
df.to_excel(writer, sheet_name=sheet_name, startrow=startrow,
**to_excel_kwargs)
# automatically set columns' width
worksheet = writer.sheets[sheet_name]
for xl_col_no, dtyp in enumerate(df.dtypes, first_col):
col_no = xl_col_no - first_col
width = max(df.iloc[:, col_no].astype(str).str.len().max(),
len(df.columns[col_no]) + 6)
width = min(max_col_width, width)
# print(f"column: [{df.columns[col_no]} ({dtyp.name})]\twidth:\t[{width}]")
column_letter = get_column_letter(xl_col_no)
worksheet.column_dimensions[column_letter].width = width
if np.issubdtype(dtyp, np.integer):
set_column_format(worksheet, column_letter, fmt_int)
if np.issubdtype(dtyp, np.floating):
set_column_format(worksheet, column_letter, fmt_float)
if autofilter:
worksheet.auto_filter.ref = worksheet.dimensions
这篇关于将 Pandas DataFrame 写入 Excel:如何自动调整列宽的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!