使用python xlrd从Excel单元格获取公式 [英] Get formula from Excel cell with python xlrd

查看:918
本文介绍了使用python xlrd从Excel单元格获取公式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须将一个算法从Excel表格连接到python代码,但是我必须从Excel文件逆向工程算法。



Excel表格相当复杂,它包含许多单元格,其中有公式引用其他单元格(也可以包含公式或常量)。



我的想法是用python脚本分析表格,构建一个单元格之间的依赖关系表,即:


<公式:= sqrt(B4)+ C5 * E7

A2取决于B5,C6公式:= sin(B5)* C6

...


xlrd python模块允许读取XLS工作簿,但是当时我可以访问单元格的,而不是公式



例如,使用以下代码,我可以简单的获取一个单元格的值: p>

  import xlrd 

#open .xls文件
xlsname =test.xls
book = xlrd.open_workbook(xlsname)

#建立一本书的名称 - >的书籍
sd = {}
在书中。 sheet():
sd [s.name] = s

#obtain SheetFoglio 1从工作表名称字典
sheet = sd [Foglio 1]

单元格的#print值J141
print sheet.cell(142,9)

无论如何,似乎没有办法从 .cell(...)方法返回的Cell对象获取公式。
文档中,他们说可以得到公式的字符串版本(英文版,因为没有有关功能名称转换的信息存储在Excel文件中)。他们谈到名称操作数类中的公式(表达式),无论如何我无法理解如何通过单元格获取这些类的实例你可以建议一个代码片段,从单元格中获取公式文本?

解决方案

[Dis] claimer:我是 xlrd 的作者/维护者。



公式文本的文档引用关于名称公式;阅读文档开头附近的命名引用,常量,公式和宏一节。这些公式与全部或全部相关联的名称相关联;它们与单个细胞无关。例如: PI 映射到 = 22/7 SALES 映射到 = Mktng!$ A $ 2:$ Z $ 99 。名称公式反编译器是为了支持检查定义名称的更简单和/或常用的用法而编写的。



通常,公式有几种:单元格,共享式,和数组(全部与单元格相关联,直接或间接),名称,数据验证和条件格式。



将字节码中的通用公式解码为文本是一种在进行中,缓慢。请注意,假设它可用,那么您需要解析文本公式来提取单元格引用。正确分析Excel公式不是一件容易的事情;与HTML一样,使用正则表达式看起来很容易,但不起作用。最好从公式字节码直接提取引用。



另请注意,基于单元格的公式可以引用名称,名称公式可以同时指向单元格和其他名称。因此,有必要从基于单元格和名称公式中提取单元格和名称引用。您可以使用有关共享公式的信息;否则解析了以下内容:

  B2 = A2 
B3 = A3 + B2
B4 = A4 + B3
B5 = A5 + B4
...
B60 = A60 + B59

您需要自己推断 B3:B60 公式之间的相似性。



在任何情况下,上述任何一种都不会很快在任何时候 - xlrd 优先级在别的地方。


I have to port an algorithm from an Excel sheet to python code but I have to reverse engineer the algorithm from the Excel file.

The Excel sheet is quite complicated, it contains many cells in which there are formulas that refer to other cells (that can also contains a formula or a constant).

My idea is to analyze with a python script the sheet building a sort of table of dependencies between cells, that is:

A1 depends on B4,C5,E7 formula: "=sqrt(B4)+C5*E7"
A2 depends on B5,C6 formula: "=sin(B5)*C6"
...

The xlrd python module allows to read an XLS workbook but at the moment I can access to the value of a cell, not the formula.

For example, with the following code I can get simply the value of a cell:

import xlrd

#open the .xls file
xlsname="test.xls"
book = xlrd.open_workbook(xlsname)

#build a dictionary of the names->sheets of the book
sd={}
for s in book.sheets():
    sd[s.name]=s

#obtain Sheet "Foglio 1" from sheet names dictionary
sheet=sd["Foglio 1"]

#print value of the cell J141
print sheet.cell(142,9)

Anyway, It seems to have no way to get the formul from the Cell object returned by the .cell(...) method. In documentation they say that it is possible to get a string version of the formula (in english because there is no information about function name translation stored in the Excel file). They speak about formulas (expressions) in the Name and Operand classes, anyway I cannot understand how to get the instances of these classes by the Cell class instance that must contains them.

Could you suggest a code snippet that gets the formula text from a cell?

解决方案

[Dis]claimer: I'm the author/maintainer of xlrd.

The documentation references to formula text are about "name" formulas; read the section "Named references, constants, formulas, and macros" near the start of the docs. These formulas are associated sheet-wide or book-wide to a name; they are not associated with individual cells. Examples: PI maps to =22/7, SALES maps to =Mktng!$A$2:$Z$99. The name-formula decompiler was written to support inspection of the simpler and/or commonly found usages of defined names.

Formulas in general are of several kinds: cell, shared, and array (all associated with a cell, directly or indirectly), name, data validation, and conditional formatting.

Decompiling general formulas from bytecode to text is a "work-in-progress", slowly. Note that supposing it were available, you would then need to parse the text formula to extract the cell references. Parsing Excel formulas correctly is not an easy job; as with HTML, using regexes looks easy but doesn't work. It would be better to extract the references directly from the formula bytecode.

Also note that cell-based formulas can refer to names, and name formulas can refer both to cells and to other names. So it would be necessary to extract both cell and name references from both cell-based and name formulas. It may be useful to you to have info on shared formulas available; otherwise having parsed the following:

B2 =A2
B3 =A3+B2
B4 =A4+B3
B5 =A5+B4
...
B60 =A60+B59

you would need to deduce the similarity between the B3:B60 formulas yourself.

In any case, none of the above is likely to be available any time soon -- xlrd priorities lie elsewhere.

这篇关于使用python xlrd从Excel单元格获取公式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆