将 Pandas DataFrame 写入 Excel:如何自动调整列宽 [英] Writing Pandas DataFrame to Excel: How to auto-adjust column widths

查看:900
本文介绍了将 Pandas DataFrame 写入 Excel:如何自动调整列宽的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将一系列 Pandas DataFrame 写入 Excel 工作表,以便:

  1. 工作表的现有内容不会被覆盖或删除,并且
  2. 调整 Excel 列宽以适合列条目的长度(这样我就不必在 Excel 中手动执行此操作).

对于 1),我以@MaxU 编写的辅助函数的形式找到了一个很好的解决方案:

奇怪的是,当我尝试第二次执行该函数(不更改参数)时,出现错误:

runcell(2, 'C:/Users/Leonidas/Documents/write_to_excel2.py')回溯(最近一次调用最后一次):文件C:\Users\Leonidas\Documents\write_to_excel2.py",第 125 行,在 <module> 中.append_df_to_excel(C:/Users/Leonidas/Documents/test.xlsx", df,文件C:\Users\Leonidas\Documents\write_to_excel2.py",第 100 行,在 append_df_to_excel 中writer.sheets[sheet_name].set_column(col_idx, col_idx, column_width)AttributeError: 'Worksheet' 对象没有属性 'set_column'

此时我很困惑...任何有关如何修复代码的建议将不胜感激.

解决方案

尝试使用这个辅助函数:

将 numpy 导入为 np将熊猫导入为 pd从 pathlib 导入路径from 输入 import Union, Optional, List, Tuple从 openpyxl 导入 load_workbook从 openpyxl.utils 导入 get_column_letterdef append_df_to_excel(文件名:联合[str,路径],df: pd.DataFrame,sheet_name: str = 'Sheet1',开始:int = 无,max_col_width: int = 40,自动过滤器:布尔 = 假,fmt_int: str = "#,##0",fmt_float: str = "#,##0.00",fmt_date: str = "yyyy-mm-dd",fmt_datetime: str = "yyyy-mm-dd hh:mm",truncate_sheet: bool = False,**to_excel_kwargs) ->没有任何:"将数据帧 [df] 附加到现有 Excel 文件 [文件名]进入 [sheet_name] 工作表.如果 [filename] 不存在,则此函数将创建它.@param 文件名:文件路径或现有 ExcelWriter(例如:'/path/to/file.xlsx')@param df:要保存到工作簿的数据帧@param sheet_name:将包含数据帧的工作表的名称.(默认值:'Sheet1')@param startrow:左上角单元格行以转储数据帧.默认情况下 (startrow=None) 计算最后一行在现有 DF 中并写入下一行...@param max_col_width:Excel 中的最大列宽.默认值:30@param autofilter: boolean - 是否添加 Excel 自动过滤器.默认值:真@param fmt_int:整数的 Excel 格式@param fmt_float:浮点数的 Excel 格式@param fmt_date:日期的 Excel 格式@param fmt_datetime:日期时间的 Excel 格式@param truncate_sheet:截断(删除并重新创建)[sheet_name]在将 DataFrame 写入 Excel 文件之前@param to_excel_kwargs:将传递给`DataFrame.to_excel()`的参数【可以是字典】@return:无用法示例:>>>append_df_to_excel('d:/temp/test.xlsx', df, autofilter=True,冻结窗格=(1,0))>>>append_df_to_excel('d:/temp/test.xlsx', df, header=None, index=False)>>>append_df_to_excel('d:/temp/test.xlsx', df, sheet_name='Sheet2',指数=假)>>>append_df_to_excel('d:/temp/test.xlsx', df, sheet_name='Sheet2',索引=假,起始行=25)>>>append_df_to_excel('d:/temp/test.xlsx', df, index=False,fmt_datetime=dd.mm.yyyy hh:mm")(c) [MaxU](https://stackoverflow.com/users/5741205/maxu?tab=profile)"def set_column_format(ws, column_letter, fmt):对于 ws[column_letter] 中的单元格:cell.number_format = fmt文件名 = 路径(文件名)file_exists = 文件名.is_file()# 工艺参数first_col = int(to_excel_kwargs.get(index", True)) + 1sheet_name = to_excel_kwargs.get("sheet_name", "Sheet1")# 忽略 [engine] 参数,如果它被传递如果 to_excel_kwargs 中的引擎":to_excel_kwargs.pop('引擎')与 pd.ExcelWriter(文件名.with_suffix(.xlsx"),引擎=openpyxl",模式=a"如果 file_exists else "w",date_format=fmt_date,datetime_format=fmt_datetime,**to_excel_kwargs) 作为作家:如果文件存在:# 尝试打开一个现有的工作簿writer.book = load_workbook(文件名)# 获取现有 Excel 工作表中的最后一行# 如果没有明确指定如果 startrow 在 writer.book.sheetnames 中为 None 和 sheet_name:startrow = writer.book[sheet_name].max_row# 截断工作表如果在 writer.book.sheetnames 中 truncate_sheet 和 sheet_name:# [sheet_name] 工作表的索引idx = writer.book.sheetnames.index(sheet_name)# 删除 [sheet_name]writer.book.remove(writer.book.worksheets[idx])# 使用旧索引创建一个空表 [sheet_name]writer.book.create_sheet(sheet_name, idx)# 复制现有的工作表writer.sheets = {ws.title:ws for ws in writer.book.worksheets}别的:# 文件不存在,我们正在创建一个新的起始行 = 0# 将 DataFrame 写出到 ExcelWriterdf.to_excel(writer, sheet_name=sheet_name, startrow=startrow,**to_excel_kwargs)# 自动设置列宽工作表 = writer.sheets[sheet_name]对于 xl_col_no, enumerate(df.dtypes, first_col) 中的 dtyp:col_no = xl_col_no - first_colwidth = max(df.iloc[:, col_no].astype(str).str.len().max(),len(df.columns[col_no]) + 6)宽度 = min(max_col_width, 宽度)# print(f"column: [{df.columns[col_no]} ({dtyp.name})]\twidth:\t[{width}]")column_letter = get_column_letter(xl_col_no)worksheet.column_dimensions[column_letter].width = 宽度如果 np.issubdtype(dtyp, np.integer):set_column_format(工作表,column_letter,fmt_int)如果 np.issubdtype(dtyp, np.floating):set_column_format(工作表,column_letter,fmt_float)如果自动过滤:worksheet.auto_filter.ref = worksheet.dimensions

I am trying to write a series of pandas DataFrames to an Excel worksheet such that:

  1. The existing contents of the worksheet are not overwritten or erased, and
  2. the Excel column widths are adjusted to fit the lengths of the column entries (so that I don't have to manually do this in Excel).

For 1), I have found an excellent solution in the form of a helper function written by @MaxU: How to write to an existing excel file without overwriting data (using pandas)?. For 2) I found what looked like a good solution here. But when I try to put these solutions together, the column widths don't change at all. Here's my full code:

import pandas as pd
import os
from openpyxl import load_workbook

def append_df_to_excel(filename, df, sheet_name='Sheet1', startrow=None,
                       truncate_sheet=False, 
                       **to_excel_kwargs):
    """
    Append a DataFrame [df] to existing Excel file [filename]
    into [sheet_name] Sheet.
    If [filename] doesn't exist, then this function will create it.

    @param filename: File path or existing ExcelWriter
                     (Example: '/path/to/file.xlsx')
    @param df: DataFrame to save to workbook
    @param sheet_name: Name of sheet which will contain DataFrame.
                       (default: 'Sheet1')
    @param startrow: upper left cell row to dump data frame.
                     Per default (startrow=None) calculate the last row
                     in the existing DF and write to the next row...
    @param truncate_sheet: truncate (remove and recreate) [sheet_name]
                           before writing DataFrame to Excel file
    @param to_excel_kwargs: arguments which will be passed to `DataFrame.to_excel()`
                            [can be a dictionary]
    @return: None

    Usage examples:

    >>> append_df_to_excel('d:/temp/test.xlsx', df)

    >>> append_df_to_excel('d:/temp/test.xlsx', df, header=None, index=False)

    >>> append_df_to_excel('d:/temp/test.xlsx', df, sheet_name='Sheet2',
                           index=False)

    >>> append_df_to_excel('d:/temp/test.xlsx', df, sheet_name='Sheet2', 
                           index=False, startrow=25)

    (c) [MaxU](https://stackoverflow.com/users/5741205/maxu?tab=profile)
    """
    # Excel file doesn't exist - saving and exiting
    if not os.path.isfile(filename):
        df.to_excel(
            filename,
            sheet_name=sheet_name, 
            startrow=startrow if startrow is not None else 0, 
            **to_excel_kwargs)
        return
    
    # ignore [engine] parameter if it was passed
    if 'engine' in to_excel_kwargs:
        to_excel_kwargs.pop('engine')

    writer = pd.ExcelWriter(filename, engine='openpyxl', mode='a')

    # try to open an existing workbook
    writer.book = load_workbook(filename)
    
    # get the last row in the existing Excel sheet
    # if it was not specified explicitly
    if startrow is None and sheet_name in writer.book.sheetnames:
        startrow = writer.book[sheet_name].max_row

    # truncate sheet
    if truncate_sheet and sheet_name in writer.book.sheetnames:
        # index of [sheet_name] sheet
        idx = writer.book.sheetnames.index(sheet_name)
        # remove [sheet_name]
        writer.book.remove(writer.book.worksheets[idx])
        # create an empty sheet [sheet_name] using old index
        writer.book.create_sheet(sheet_name, idx)
    
    # copy existing sheets
    writer.sheets = {ws.title:ws for ws in writer.book.worksheets}

    if startrow is None:
        startrow = 0

    # write out the new sheet
    df.to_excel(writer, sheet_name, startrow=startrow, **to_excel_kwargs)

   
  """
   Now attempt to adjust the column widths as necessary so that all the cell contents are visible 
   in Excel. The code below is taken from https://towardsdatascience.com/how-to-auto-adjust-the-width-of-excel-columns-with-pandas-excelwriter-60cee36e175e.
 """
    for column in df:
      column_width = max(df[column].astype(str).map(len).max(), len(column))
      col_idx = df.columns.get_loc(column)
      writer.sheets[sheet_name].set_column(col_idx, col_idx, column_width)

    writer.save()

Now I tried testing the function:

df = pd.DataFrame({'A_Very_Long_Column_Name': [10, 20, 30, 20, 15, 30, 45]})
append_df_to_excel("C:/Users/Leonidas/Documents/test.xlsx", df, "Sheet1")

A new Excel workbook named test.xlsx is created along with a sheet named Sheet1, and the contents of df are written to Sheet1, but the column widths are completely unaffected:

And strangely, when I try to execute the function a second time (without changing the arguments), I get an error:

runcell(2, 'C:/Users/Leonidas/Documents/write_to_excel2.py')
Traceback (most recent call last):

  File "C:\Users\Leonidas\Documents\write_to_excel2.py", line 125, in <module>
    append_df_to_excel("C:/Users/Leonidas/Documents/test.xlsx", df,

  File "C:\Users\Leonidas\Documents\write_to_excel2.py", line 100, in append_df_to_excel
    writer.sheets[sheet_name].set_column(col_idx, col_idx, column_width)

AttributeError: 'Worksheet' object has no attribute 'set_column'

I'm pretty confused at this point...Any suggestions for how to fix the code would be greatly appreciated.

解决方案

Try to use this helper function:

import numpy as np
import pandas as pd
from pathlib import Path
from typing import Union, Optional, List, Tuple
from openpyxl import load_workbook
from openpyxl.utils import get_column_letter


def append_df_to_excel(
        filename: Union[str, Path],
        df: pd.DataFrame,
        sheet_name: str = 'Sheet1',
        startrow: int = None,
        max_col_width: int = 40,
        autofilter: bool = False,
        fmt_int: str = "#,##0",
        fmt_float: str = "#,##0.00",
        fmt_date: str = "yyyy-mm-dd",
        fmt_datetime: str = "yyyy-mm-dd hh:mm",
        truncate_sheet: bool = False,
        **to_excel_kwargs
) -> None:
    """
    Append a DataFrame [df] to existing Excel file [filename]
    into [sheet_name] Sheet.
    If [filename] doesn't exist, then this function will create it.

    @param filename: File path or existing ExcelWriter
                     (Example: '/path/to/file.xlsx')
    @param df: DataFrame to save to workbook
    @param sheet_name: Name of sheet which will contain DataFrame.
                       (default: 'Sheet1')
    @param startrow: upper left cell row to dump data frame.
                     Per default (startrow=None) calculate the last row
                     in the existing DF and write to the next row...
    @param max_col_width: maximum column width in Excel. Default: 30
    @param autofilter: boolean - whether add Excel autofilter or not. Default: True
    @param fmt_int: Excel format for integer numbers
    @param fmt_float: Excel format for float numbers
    @param fmt_date: Excel format for dates
    @param fmt_datetime: Excel format for datetime's
    @param truncate_sheet: truncate (remove and recreate) [sheet_name]
                           before writing DataFrame to Excel file
    @param to_excel_kwargs: arguments which will be passed to `DataFrame.to_excel()`
                            [can be a dictionary]
    @return: None

    Usage examples:

    >>> append_df_to_excel('d:/temp/test.xlsx', df, autofilter=True,
                           freeze_panes=(1,0))

    >>> append_df_to_excel('d:/temp/test.xlsx', df, header=None, index=False)

    >>> append_df_to_excel('d:/temp/test.xlsx', df, sheet_name='Sheet2',
                           index=False)

    >>> append_df_to_excel('d:/temp/test.xlsx', df, sheet_name='Sheet2',
                           index=False, startrow=25)

    >>> append_df_to_excel('d:/temp/test.xlsx', df, index=False,
                           fmt_datetime="dd.mm.yyyy hh:mm")

    (c) [MaxU](https://stackoverflow.com/users/5741205/maxu?tab=profile)
    """
    def set_column_format(ws, column_letter, fmt):
        for cell in ws[column_letter]:
            cell.number_format = fmt
    filename = Path(filename)
    file_exists = filename.is_file()
    # process parameters
    first_col = int(to_excel_kwargs.get("index", True)) + 1
    sheet_name = to_excel_kwargs.get("sheet_name", "Sheet1")
    # ignore [engine] parameter if it was passed
    if 'engine' in to_excel_kwargs:
        to_excel_kwargs.pop('engine')

    with pd.ExcelWriter(
        filename.with_suffix(".xlsx"),
        engine="openpyxl",
        mode="a" if file_exists else "w",
        date_format=fmt_date,
        datetime_format=fmt_datetime,
        **to_excel_kwargs
    ) as writer:
        if file_exists:
            # try to open an existing workbook
            writer.book = load_workbook(filename)
            # get the last row in the existing Excel sheet
            # if it was not specified explicitly
            if startrow is None and sheet_name in writer.book.sheetnames:
                startrow = writer.book[sheet_name].max_row
            # truncate sheet
            if truncate_sheet and sheet_name in writer.book.sheetnames:
                # index of [sheet_name] sheet
                idx = writer.book.sheetnames.index(sheet_name)
                # remove [sheet_name]
                writer.book.remove(writer.book.worksheets[idx])
                # create an empty sheet [sheet_name] using old index
                writer.book.create_sheet(sheet_name, idx)

            # copy existing sheets
            writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
        else:
            # file doesn't exist, we are creating a new one
            startrow = 0

        # write out the DataFrame to an ExcelWriter
        df.to_excel(writer, sheet_name=sheet_name, startrow=startrow,
                    **to_excel_kwargs)

        # automatically set columns' width
        worksheet = writer.sheets[sheet_name]
        for xl_col_no, dtyp in enumerate(df.dtypes, first_col):
            col_no = xl_col_no - first_col
            width = max(df.iloc[:, col_no].astype(str).str.len().max(),
                        len(df.columns[col_no]) + 6)
            width = min(max_col_width, width)
            # print(f"column: [{df.columns[col_no]} ({dtyp.name})]\twidth:\t[{width}]")
            column_letter = get_column_letter(xl_col_no)
            worksheet.column_dimensions[column_letter].width = width
            if np.issubdtype(dtyp, np.integer):
                set_column_format(worksheet, column_letter, fmt_int)
            if np.issubdtype(dtyp, np.floating):
                set_column_format(worksheet, column_letter, fmt_float)
        if autofilter:
            worksheet.auto_filter.ref = worksheet.dimensions

这篇关于将 Pandas DataFrame 写入 Excel:如何自动调整列宽的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆