根据Excel工作表中的单元格颜色和文本颜色设置数据框子集 [英] Subsetting a dataframe based on cell color and text color in excel sheet

查看:108
本文介绍了根据Excel工作表中的单元格颜色和文本颜色设置数据框子集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个Excel表格,其中包含1000多个列和300行.这些单元格中的某些具有正常数据,而某些单元格具有红色的背景色,而某些单元格具有正常的白色背景,但文本为红色.例如,我的Excel工作表如下所示:

I have an excel sheet with more than 1000 columns and 300 rows. Some these cells have normal data, while some cells have a background color of red and some cells have normal white background but the text is red in color. For example, my excel sheet looks like below:

我正在将此excel表读入Python(pandas),以将其用作数据框并对其执行进一步的操作.但是,红色文本和红色单元格需要与正常单元格区别对待.

I am reading this excel sheet into Python (pandas) to use it as a dataframe and perform further actions on it. However, the red text and red cells need to be treated differently than the normal cells.

因此,我想将上面的表分为3个表,以便:表1包含所有单元格,但红色背景单元格为空.表2仅包含文本为红色的那些行和列.表3仅包含背景为红色的那些行和列.

Therefore, I would like to split the above table into 3 tables, such that: Table one has all the cells but the red background cells are empty. Table 2 has only those rows and columns where the text is red. Table 3 has only those rows and columns where the background is red.

我猜这不可能在熊猫中做到.我尝试使用StyleFrame,但失败了.

I guess it cannot be done in Pandas. I tried using StyleFrame but failed.

有人可以在这方面提供帮助吗?在这种情况下,是否有任何有用的python软件包?

Can anyone help in this regard? Is there any python package that is helpful in this case?

推荐答案

这几乎是实现此目的的方法. 不漂亮,因为StyleFrame并非真正设计为以这种方式使用.

This is pretty much the way to achieve this. It is not pretty as StyleFrame wasn't really designed to be used this way.

读取源Excel文件

import numpy as np
from StyleFrame import StyleFrame, utils

sf = StyleFrame.read_excel('test.xlsx', read_style=True, use_openpyxl_styles=False)

1)除具有红色背景的单元格以外的所有单元格均为空

def empty_red_background_cells(cell):
    if cell.style.bg_color in {utils.colors.red, 'FFFF0000'}:
        cell.value = np.nan
    return cell

sf_1 = StyleFrame(sf.applymap(empty_red_background_cells))    
print(sf_1)
#      C1       C2 C3    C4      C5      C6
# 0    a1      1.0  s   nan  1001.0  1234.0
# 1    a2     12.0  s   nan  1001.0  4322.0
# 2    a3      nan  s   nan  1001.0  4432.0
# 3    a4    232.0  s   nan  1001.0  4432.0
# 4    a5    343.0  s  99.0     nan     nan
# 5    a6      3.0  s  99.0  1001.0  4432.0
# 6    a7     34.0  s  99.0  1001.0  4432.0
# 7    a8      5.0  s   nan  1001.0  4432.0
# 8    a9      6.0  s  99.0  1001.0  4432.0
# 9   a10    565.0  s  99.0     nan  4432.0
# 10  a11   5543.0  s  99.0  1001.0  4432.0
# 11  a12    112.0  s  99.0  1001.0     nan
# 12  a13  34345.0  s  99.0  1001.0  4432.0
# 13  a14      0.0  s  99.0     nan     nan
# 14  a15    453.0  s  99.0  1001.0     nan

2)仅包含红色文本的单元格

def only_cells_with_red_text(cell):
    return cell if cell.style.font_color in {utils.colors.red, 'FFFF0000'} else np.nan

sf_2 = StyleFrame(sf.applymap(only_cells_with_red_text).dropna(axis=(0, 1), how='all'))
# passing a tuple to pandas.dropna is deprecated since pandas 0.23.0, but this can be
# avoided by simply calling dropna twice, once with axis=0 and once with axis=1

print(sf_2)
#         C2      C6
# 7     nan   4432.0
# 8     nan   4432.0
# 9    565.0     nan
# 10  5543.0     nan
# 11   112.0     nan

3)仅具有红色背景的单元格

def only_cells_with_red_background(cell):
    return cell if cell.style.bg_color in {utils.colors.red, 'FFFF0000'} else np.nan

sf_3 = StyleFrame(sf.applymap(only_cells_with_red_background).dropna(axis=(0, 1), how='all'))
# passing a tuple to pandas.dropna is deprecated since pandas 0.23.0, but this can be
# avoided by simply calling dropna twice, once with axis=0 and once with axis=1

print(sf_3)
#        C4      C6
# 0    99.0     nan
# 1    99.0     nan
# 2    99.0     nan
# 3    99.0     nan
# 13    nan  4432.0
# 14    nan  4432.0

这篇关于根据Excel工作表中的单元格颜色和文本颜色设置数据框子集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆