OpenpyXL - 如何返回任何给定工作簿的单元格的计算和公式组件 [英] OpenpyXL - How to return BOTH the calculated and formula components of a cell for any given workbook

查看:108
本文介绍了OpenpyXL - 如何返回任何给定工作簿的单元格的计算和公式组件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在工作中,我们使用 Excel 文档对样本进行分析,工作簿中的一个工作表是顶层工作表",其中数据的所有重要部分都整理成一个漂亮的报告表格.我编写了一个详尽的 python 脚本,它使用 OpenPyXL 提取所有必要的数据,以便我可以从多个文件中提取这些数据.所有这些都需要在打开工作簿时实现 data_only=True 标志.

At work we use Excel docs to perform analyses on samples, and one worksheet in the workbook is the 'Top Sheet', where all the important parts of the data are collated into a nice table for reports. I've written an exhaustive python script which uses OpenPyXL to pull out all the necessary data so I can pull this data out of multiple files. All of this requires implementing the data_only=True flag upon opening the workbook.

然而 - 我现在需要从工作簿的另一部分提取数据,另一个工作表(不是顶层工作表),它实际上在顶层工作表"的公式中被引用.我通过基本上让脚本打开工作簿两次来实现这一点 - 一次拉出所有顶部工作表数据,一次拉出公式(实际上是一系列单元格引用).

HOWEVER - I now need to pull data from another part of the workbook, another sheet (not the top sheet), which is actually referenced in a formula on the 'Top Sheet'. I've achieved this clunkily by essentially making the script open the workbook twice - once to pull out all the top sheet data, and once to pull out the formula (actually a series of cell references).

wb_value = load_workbook(os.path.join(root, fname), data_only=True)

wb_formula = load_workbook(os.path.join(root, fname))

OpenPyXL 有没有办法在不重新打开的情况下返回数据和公式?这实际上使执行提取循环所需的时间加倍,正如我上面提到的,这是一个更大结构的一部分,该结构循环遍历整个目录以找到合适的 .xlsx 文件.

Is there any way in OpenPyXL to return the data AND the formula without reopening? This literally doubles the time required to perform the extraction loop, which as I alluded to above is part of a larger structure which loops over entire directories to find suitable .xlsx files.

非常感谢任何帮助.

附注.我已经阅读了 Charlie Clark 对 que 23350581 的回答,所以我并不乐观,有一个简单的解决方案,但希望有人能想出一个解决方法.

PS. I've read Charlie Clark's answer to que 23350581 so am not optimistic there is an easy solution, but hopefully someone can conceive a workaround.

PPS.我无法对我想要提取的单元格引用进行硬编码,因为它们随着时间的推移发生了变化,并且有人对 Excel 进行了硬编码以跟踪位置(多年).

PPS. I can't hardcode the cell references I want to pull out as they, over time, have changed and someone has hardcoded Excel to track the position (over multiple years).

干杯

推荐答案

可以通过修改 openpyxl 的代码来实现.请注意,我已经在不是最新版本的 openpyxl 2.2.5 上进行了这些更改.因此,您的行号可能会有所不同.

It is possible by modifying openpyxl's code. Note that I've made these changes on openpyxl 2.2.5 which is not the latest version. Therefore line numbers will probably be different for you.

这是一个快速的猴子修补程序,很可能不是最好的方法,但它应该能让你继续前进.

This is a quick monkey patching and most probably not the best way of doing this but it should get you going.

请注意,如果您修改公式(如查理克拉克在他的回答中所说),这些更改不会使 openpyxl 重新计算值.这只会导致 load_workbook 检索单元格的值和公式,而不需要调用它两次.

Note that these changes won't make openpyxl to re-calculate the values if you modify a formula (as Charlie Clark said in his answer). This will simply cause load_workbook to retrieve both the cells' values and formulas instead of the need to call it twice.


\openpyxl\cell\cell.py:

第 84 行:

'formula'添加到__slots__:

 __slots__ =  (
        'column',
        'row',
        'coordinate',
        '_value',
        'formula',      
        'data_type',
        'parent',
        'xf_index',
        '_hyperlink_rel',
        '_comment')


第111行:更改 Cell__init__ 以接受带有默认参数 Noneformula 参数:


line 111: Change Cell's __init__ to accept formula arg with default argument None:

def __init__(self, worksheet, column, row, value=None, formula=None, fontId=0,
                 fillId=0, borderId=0, alignmentId=0, protectionId=0, numFmtId=0,
                 pivotButton=None, quotePrefix=None, xfId=None):

并在 __init__ 主体中对其进行初始化:

And initialize it in __init__ body:

self.formula = 公式


\openpyxl\reader\worksheet.py

第 111 行:

我们不再关心 data_only,因此将行从 if formula is not None and not self.data_only: 更改为 if formula is不是无:.

We don't really care about data_only anymore, so change the line from if formula is not None and not self.data_only: to if formula is not None:.

第 113 - 116 行:

lines 113 - 116:

更改以下行

if formula.text:
    value = "=" + formula.text
else:
    value = "="

if formula.text:
    _formula = "=" + formula.text
else:
    _formula = "="

你应该看到这些行下面的几行

A few lines bellow these lines you should see

else:
    cell._value=value
    cell.data_type=data_type

在这些行的正下方添加:

Immediately below these lines add:

try:
    cell.formula = _formula
except UnboundLocalError:
    pass



就是这样!让我们测试一下:

That's it! let's test it:

我创建了一个新的 xlsx 文件.单元格 A1 有公式 =1+1,单元格 A2 没有公式,只有 2 的普通值.

I created a new xlsx file. Cell A1 has the formula =1+1 and cell A2 has no formula, just the plain value of 2.

wb = load_workbook('test.xlsx')
sheet = wb.get_sheet_by_name('Sheet1')

cell = sheet.cell(row=1, column=1)
print(cell.value)
print(cell.formula)

>> 2
>> =1+1

cell = sheet.cell(row=2, column=1)
print(cell.value)
print(cell.formula)

>> 2
>> None

请注意,无论传递给 open_workbookdata_only 的值如何,这都将起作用.

Note that this will work regardless of the value of data_only that was passed to open_workbook.

这篇关于OpenpyXL - 如何返回任何给定工作簿的单元格的计算和公式组件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆