使用 python xlrd 从 Excel 单元格中获取公式 [英] Get formula from Excel cell with python xlrd

查看:69
本文介绍了使用 python xlrd 从 Excel 单元格中获取公式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须将算法从 Excel 工作表移植到 python 代码,但我必须对 Excel 文件中的算法进行逆向工程.

Excel 表格相当复杂,它包含许多单元格,其中有引用其他单元格的公式(也可以包含公式或常量).

The Excel sheet is quite complicated, it contains many cells in which there are formulas that refer to other cells (that can also contains a formula or a constant).

我的想法是用python脚本分析工作表,构建一种单元格之间的依赖关系表,即:

My idea is to analyze with a python script the sheet building a sort of table of dependencies between cells, that is:

A1 依赖于 B4,C5,E7 公式:"=sqrt(B4)+C5*E7"
A2 取决于 B5,C6 公式:"=sin(B5)*C6"
...

A1 depends on B4,C5,E7 formula: "=sqrt(B4)+C5*E7"
A2 depends on B5,C6 formula: "=sin(B5)*C6"
...

xlrd python 模块允许读取 XLS 工作簿,但目前我可以访问单元格的,而不是公式.

The xlrd python module allows to read an XLS workbook but at the moment I can access to the value of a cell, not the formula.

例如,使用以下代码,我可以简单地获取单元格的值:

For example, with the following code I can get simply the value of a cell:

import xlrd

#open the .xls file
xlsname="test.xls"
book = xlrd.open_workbook(xlsname)

#build a dictionary of the names->sheets of the book
sd={}
for s in book.sheets():
    sd[s.name]=s

#obtain Sheet "Foglio 1" from sheet names dictionary
sheet=sd["Foglio 1"]

#print value of the cell J141
print sheet.cell(142,9)

反正好像没办法从.cell(...)方法返回的Cell对象中获取form.在文档中,他们说可以获取公式的字符串版本(使用英语,因为 Excel 文件中没有存储有关函数名称转换的信息).他们在 NameOperand 类中谈论公式(表达式),无论如何我无法理解如何通过 Cell 获取这些类的实例必须包含它们的类实例.

Anyway, It seems to have no way to get the formul from the Cell object returned by the .cell(...) method. In documentation they say that it is possible to get a string version of the formula (in english because there is no information about function name translation stored in the Excel file). They speak about formulas (expressions) in the Name and Operand classes, anyway I cannot understand how to get the instances of these classes by the Cell class instance that must contains them.

你能推荐一个从单元格中获取公式文本的代码片段吗?

Could you suggest a code snippet that gets the formula text from a cell?

推荐答案

[Dis]claimer: 我是 xlrd 的作者/维护者.

[Dis]claimer: I'm the author/maintainer of xlrd.

对公式文本的文档参考是关于名称"公式的;阅读文档开头附近的命名引用、常量、公式和宏"部分.这些公式在工作表范围或书范围内与名称相关联;它们与单个细胞无关.示例:PI 映射到 =22/7SALES 映射到 =Mktng!$A$2:$Z$99>.名称公式反编译器旨在支持检查定义名称的更简单和/或常见用法.

The documentation references to formula text are about "name" formulas; read the section "Named references, constants, formulas, and macros" near the start of the docs. These formulas are associated sheet-wide or book-wide to a name; they are not associated with individual cells. Examples: PI maps to =22/7, SALES maps to =Mktng!$A$2:$Z$99. The name-formula decompiler was written to support inspection of the simpler and/or commonly found usages of defined names.

公式通常有几种类型:单元格、共享和数组(都与单元格直接或间接关联)、名称、数据验证和条件格式.

Formulas in general are of several kinds: cell, shared, and array (all associated with a cell, directly or indirectly), name, data validation, and conditional formatting.

将通用公式从字节码反编译为文本是一项正在进行的工作",很慢.请注意,假设它可用,则您需要解析文本公式以提取单元格引用.正确解析 Excel 公式并非易事;与 HTML 一样,使用正则表达式看起来很容易,但行不通.直接从公式字节码中提取引用会更好.

Decompiling general formulas from bytecode to text is a "work-in-progress", slowly. Note that supposing it were available, you would then need to parse the text formula to extract the cell references. Parsing Excel formulas correctly is not an easy job; as with HTML, using regexes looks easy but doesn't work. It would be better to extract the references directly from the formula bytecode.

另请注意,基于单元格的公式可以引用名称,名称公式可以引用单元格和其他名称.因此,有必要从基于单元格的公式和名称公式中提取单元格和名称引用.获取有关可用共享公式的信息可能对您有用;否则解析以下内容:

Also note that cell-based formulas can refer to names, and name formulas can refer both to cells and to other names. So it would be necessary to extract both cell and name references from both cell-based and name formulas. It may be useful to you to have info on shared formulas available; otherwise having parsed the following:

B2 =A2
B3 =A3+B2
B4 =A4+B3
B5 =A5+B4
...
B60 =A60+B59

您需要自己推断 B3:B60 公式之间的相似性.

you would need to deduce the similarity between the B3:B60 formulas yourself.

无论如何,上述任何一项都不可能在短期内可用——xlrd 优先级在别处.

In any case, none of the above is likely to be available any time soon -- xlrd priorities lie elsewhere.

这篇关于使用 python xlrd 从 Excel 单元格中获取公式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆