根据值合并行( pandas 到Excel-xlsxwriter) [英] Merge rows based on value (pandas to excel - xlsxwriter)

查看:172
本文介绍了根据值合并行( pandas 到Excel-xlsxwriter)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用xlsxwriter将Pandas数据帧输出到excel文件中.但是,我正在尝试应用一些基于规则的格式.特别是尝试合并具有相同值的单元格,但是在编写循环时遇到了麻烦. (这里是Python的新功能!)

I'm trying to output a Pandas dataframe into an excel file using xlsxwriter. However I'm trying to apply some rule-based formatting; specifically trying to merge cells that have the same value, but having trouble coming up with how to write the loop. (New to Python here!)

有关输出与预期输出的信息,请参见下文:

See below for output vs output expected:

(您可以根据上图看到,当它们具有相同的值时,我试图合并名称"列下的单元格).

(As you can see based off the image above I'm trying to merge cells under the Name column when they have the same values).

这是我到目前为止所拥有的:

Here is what I have thus far:

#This is the logic you use to merge cells in xlsxwriter (just an example)
worksheet.merge_range('A3:A4','value you want in merged cells', merge_format)

#Merge Car type Loop thought process...
#1.Loop through data frame where row n Name = row n -1 Name
#2.Get the length of the rows that have the same Name
#3.Based off the length run the merge_range function from xlsxwriter, worksheet.merge_range('range_found_from_loop','Name', merge_format)


for row_index in range(1,len(car_report)):
     if car_report.loc[row_index, 'Name'] == car_report.loc[row_index-1, 'Name'] 
     #find starting point based off index, then get range by adding number of rows to starting point. for example lets say rows 0-2 are similar I would get 'A0:A2' which I can then put in the code below
     #from there apply worksheet.merge_range('A0:A2','[input value]', merge_format)

任何帮助将不胜感激!

谢谢!

推荐答案

您的逻辑几乎是正确的,但是我通过稍微不同的方法来解决您的问题:

Your logic is almost correct, however i approached your problem through a slightly different approach:

1)对列进行排序,确保所有值都分组在一起.

1) Sort the column, make sure that all the values are grouped together.

2)重置索引(使用reset_index()并可能通过arg drop = True).

2) Reset the index (using reset_index() and maybe pass the arg drop=True).

3)然后,我们必须捕获新值所在的行.为此,请创建一个列表并添加第一行1,因为我们肯定会从此处开始.

3) Then we have to capture the rows where the value is new. For that purpose create a list and add the first row 1 because we will start for sure from there.

4)然后开始遍历该列表的行并检查一些条件:

4) Then start iterating over the rows of that list and check some conditions:

4a)如果只有一行带有值,则merge_range方法将出现错误,因为它无法合并一个单元格.在这种情况下,我们需要用write方法替换merge_range.

4a) If we only have one row with a value the merge_range method will give an error because it can not merge one cell. In that case we need to replace the merge_range with the write method.

4b)使用此算法,尝试写入列表的最后一个值时会出现索引错误(因为它正在将其与下一个索引位置中的值进行比较,并且因为它是列表的最后一个值)没有下一个索引位置).因此,我们需要特别提及的是,如果遇到索引错误(这意味着我们正在检查最后一个值),我们希望合并或写入直到数据帧的最后一行.

4b) With this algorithm you 'll get an index error when trying to write the last value of the list (because it is comparing it with the value in the next index postion, and because it is the last value of the list there is not a next index position). So we need to specifically mention that if we get an index error (which means we are checking the last value) we want to merge or write until the last row of the dataframe.

4c)最后,我没有考虑列是否包含空白或空单元格.在这种情况下,需要调整代码.

4c) Finally i did not take into consideration if the column contains blank or null cells. In that case code needs to be adjusted.

最后的代码可能看起来有些混乱,您必须记住,pandas的第一行的索引为0(标头是单独的),而xlsxwriter的标头的索引为0,而第一行的索引为1.

Lastly code might look a bit confusing, you have to take in mind that the 1st row for pandas is 0 indexed (headers are separate) while for xlsxwriter headers are 0 indexed and the first row is indexed 1.

以下是一个可以实际实现您想要做的事的示例:

Here is a working example to achieve exactly what you want to do:

import pandas as pd

# Create a test df
df = pd.DataFrame({'Name': ['Tesla','Tesla','Toyota','Ford','Ford','Ford'],
                   'Type': ['Model X','Model Y','Corolla','Bronco','Fiesta','Mustang']})

# Create the list where we 'll capture the cells that appear for 1st time,
# add the 1st row and we start checking from 2nd row until end of df
startCells = [1]
for row in range(2,len(df)+1):
    if (df.loc[row-1,'Name'] != df.loc[row-2,'Name']):
        startCells.append(row)


writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', index=False)
workbook = writer.book
worksheet = writer.sheets['Sheet1']
merge_format = workbook.add_format({'align': 'center', 'valign': 'vcenter', 'border': 2})


lastRow = len(df)

for row in startCells:
    try:
        endRow = startCells[startCells.index(row)+1]-1
        if row == endRow:
            worksheet.write(row, 0, df.loc[row-1,'Name'], merge_format)
        else:
            worksheet.merge_range(row, 0, endRow, 0, df.loc[row-1,'Name'], merge_format)
    except IndexError:
        if row == lastRow:
            worksheet.write(row, 0, df.loc[row-1,'Name'], merge_format)
        else:
            worksheet.merge_range(row, 0, lastRow, 0, df.loc[row-1,'Name'], merge_format)


writer.save()

输出:

这篇关于根据值合并行( pandas 到Excel-xlsxwriter)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆