pandas to_excel()忽略/允许重复的列名 [英] pandas to_excel() ignore/allow duplicate column names
问题描述
使用pandas的to_excel()函数后,是否可以忽略重复的列名?
Is there way to ignore duplicate column names after using to_excel() function of pandas?
说,我有 old_wb.xlsx :
>> df1 = pd.read_excel('wb1.xlsx')
---------------------merged header--------------------
col1 col2 col3 col1 col4 col1 col2 col5
test test test test test test test test
并说我对Excel文件做了一些处理,例如,删除合并的标头并将其保存到另一个Excel文件中:
and say I did some processing to my Excel file, for example, remove the merged header and save it to another Excel file:
>> df1.to_excel('new_wb.xlsx', 'Sheet1', merged_cells=False, header=None, index=False)
new_wb.xlsx 的列名称如下:
col1 col2 col3 col1.1 col4 col1.2 col2.1 col5
test test test test test test test test
它在重复的列名中添加了.1
,并且随着重复的列名的增加而递增.
It added .1
to the duplicate column names and it also increments as the duplicate column name goes.
在使用to_excel()
之前,我尝试重命名列名,但是它不起作用.似乎在to_excel()
中重命名了重复项.
I tried renaming the column name before using to_excel()
but it didn't work. It seems the renaming of duplicates happens in to_excel()
.
>> df1.rename(columns=lambda x: x.replace('.1',''))
搜索后,我发现to_excel()
的参数是mangle_dupe_cols=False
,不幸的是它返回了:
Upon searching, I found an argument for to_excel()
which is mangle_dupe_cols=False
, unfortunately it returned:
ValueError: Setting mangle_dupe_cols=False is not supported yet
在保存`to_excel()'时如何忽略重复的列名的任何帮助
Any help on how to ignore duplicate column names in saving `to_excel()'
推荐答案
@Ricky Aguilar 有一个很好的解决方案.我接受了他的解决方案,只是使其变得更加动态.
@Ricky Aguilar has a great solution. I took his solution and just made it more dynamic.
现在,您可以重命名所有重复的标题,甚至不知道它们的值是什么
def dataframe_allowing_duplicate_headers():
# To Hold All The Possible Duplicate Tags ['.1', '.2', '.3', ...]
dup_id_range = []
# Load Your Excel File Using Pandas
dataframe = pandas.read_excel("path_to_excel_file", sheet_name="sheetname")
# Generate And Store All The Possible Duplicate Tags ['.1', '.2', '.3', ...]
for count in range(0, len(dataframe.columns)):
dup_id_range.append( '.{}'.format(count) )
# Search And Replace All Duplicate Headers To What It Was Set As Originally
def rename(dataframe, character_number):
duplicate_columns_chars = list(
filter(lambda v: v[(len(v)-character_number):] in dup_id_range,
dataframe.columns))
for duplicate_column in duplicate_columns_chars:
dataframe = dataframe.rename(
columns={duplicate_column:duplicate_column[:-character_number]})
return dataframe
# Replace The Possible Duplicates Respectfully Based On Columns Count
if len(dup_id_range) > 0:
dataframe = rename(dataframe, 2)
if len(dup_id_range) > 9:
dataframe = rename(dataframe, 3)
if len(dup_id_range) > 99:
dataframe = rename(dataframe, 4)
# If You Have More Than A Thousand Columns (lol)
#if len(dup_id_range) > 999:
# dataframe = rename(dataframe, 5)
return dataframe
用法:
# This Dataframe Will Have All Your Headers, Allowing Your Duplicates
my_dataframe = dataframe_allowing_duplicate_headers()
这篇关于 pandas to_excel()忽略/允许重复的列名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!