pandas to_excel()忽略/允许重复的列名 [英] pandas to_excel() ignore/allow duplicate column names

查看:382
本文介绍了 pandas to_excel()忽略/允许重复的列名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用pandas的to_excel()函数后,是否可以忽略重复的列名?

Is there way to ignore duplicate column names after using to_excel() function of pandas?

说,我有 old_wb.xlsx :

>> df1 = pd.read_excel('wb1.xlsx')
        ---------------------merged header--------------------
        col1    col2   col3   col1   col4   col1   col2   col5
        test    test   test   test   test   test   test   test

并说我对Excel文件做了一些处理,例如,删除合并的标头并将其保存到另一个Excel文件中:

and say I did some processing to my Excel file, for example, remove the merged header and save it to another Excel file:

>> df1.to_excel('new_wb.xlsx', 'Sheet1', merged_cells=False, header=None, index=False)

new_wb.xlsx 的列名称如下:

        col1    col2   col3   col1.1   col4   col1.2   col2.1   col5
        test    test   test   test     test   test     test     test

它在重复的列名中添加了.1,并且随着重复的列名的增加而递增.

It added .1 to the duplicate column names and it also increments as the duplicate column name goes.

在使用to_excel()之前,我尝试重命名列名,但是它不起作用.似乎在to_excel()中重命名了重复项.

I tried renaming the column name before using to_excel() but it didn't work. It seems the renaming of duplicates happens in to_excel().

>> df1.rename(columns=lambda x: x.replace('.1',''))

搜索后,我发现to_excel()的参数是mangle_dupe_cols=False,不幸的是它返回了:

Upon searching, I found an argument for to_excel() which is mangle_dupe_cols=False, unfortunately it returned:

ValueError: Setting mangle_dupe_cols=False is not supported yet

在保存`to_excel()'时如何忽略重复的列名的任何帮助

Any help on how to ignore duplicate column names in saving `to_excel()'

推荐答案

@Ricky Aguilar 有一个很好的解决方案.我接受了他的解决方案,只是使其变得更加动态.

@Ricky Aguilar has a great solution. I took his solution and just made it more dynamic.

现在,您可以重命名所有重复的标题,甚至不知道它们的值是什么

def dataframe_allowing_duplicate_headers():
    # To Hold All The Possible Duplicate Tags ['.1', '.2', '.3', ...]
    dup_id_range = []

    # Load Your Excel File Using Pandas
    dataframe = pandas.read_excel("path_to_excel_file", sheet_name="sheetname")

    # Generate And Store All The Possible Duplicate Tags ['.1', '.2', '.3', ...]
    for count in range(0, len(dataframe.columns)):
        dup_id_range.append( '.{}'.format(count) )

    # Search And Replace All Duplicate Headers To What It Was Set As Originally
    def rename(dataframe, character_number):
        duplicate_columns_chars = list(
            filter(lambda v: v[(len(v)-character_number):] in dup_id_range,
            dataframe.columns))

        for duplicate_column in duplicate_columns_chars:
            dataframe = dataframe.rename(
                columns={duplicate_column:duplicate_column[:-character_number]})
        return dataframe


    # Replace The Possible Duplicates Respectfully Based On Columns Count
    if len(dup_id_range) > 0:
        dataframe = rename(dataframe, 2)
        if len(dup_id_range) > 9:
            dataframe = rename(dataframe, 3)
            if len(dup_id_range) > 99:
                dataframe = rename(dataframe, 4)
                # If You Have More Than A Thousand Columns (lol)
                #if len(dup_id_range) > 999:
                #    dataframe = rename(dataframe, 5)

    return dataframe

用法:

# This Dataframe Will Have All Your Headers, Allowing Your Duplicates
my_dataframe = dataframe_allowing_duplicate_headers()

这篇关于 pandas to_excel()忽略/允许重复的列名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆