使用Python查找Excel单元格引用 [英] Finding Excel cell reference using Python

查看:556
本文介绍了使用Python查找Excel单元格引用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此处是有问题的Excel文件:

上下文:我正在编写一个程序,该程序可以从PDF中提取值并将其放在Excel文件中的相应单元格中.

Context: I am writing a program which can pull values from a PDF and put them in the appropriate cell in an Excel file.

问题:我想编写一个函数,该函数将列值(例如2014)和行值(例如'COGS')作为参数,并返回这两个相交的单元格引用(例如2014 COGS的"C3")

Question: I want to write a function which takes a column value (e.g. 2014) and a row value (e.g. 'COGS') as arguments and return the cell reference where those two intersect (e.g. 'C3' for 2014 COGS).

def find_correct_cell(year=2014, item='COGS'):
    #do something similar to what the =match function in Excel does
    return cell_reference #returns 'C3'

我已经尝试过像这样使用openpyxl来更改一些随机空单元格的值存储以下值:

    col_num = '=match(2014, A1:E1)'
    row_num = '=match("COGS", A1:A5)'

但是我想获取这些值,而不必任意写入那些随机的空单元格.另外,即使采用这种方法,当我读取那些单元格(F5和F6)时,它也会读取这些单元格中的公式,而不是3的面值.

But I want to grab those values without having to arbitrarily write to those random empty cells. Plus, even with this method, when I read those cells (F5 and F6) it reads the formulae in those cells and not the face value of 3.

感谢您的帮助.

推荐答案

要使用openpyxl以这种方式正确操作Excel文件,您需要获得许多细节.首先,值得一提的是xlsx文件包含每个单元格的两种表示形式-公式和公式的当前值. openpyxl可以返回任何一个,如果要使用值,则在打开文件时应指定data_only=True.另外,当您更改单元格的公式时,openpyxl无法计算新值-只有Excel本身可以执行此操作.因此,插入MATCH()工作表函数无法解决您的问题.

There are a surprising number of details you need to get right to manipulate Excel files this way with openpyxl. First, it's worth knowing that the xlsx file contains two representations of each cell - the formula, and the current value of the formula. openpyxl can return either, and if you want values you should specify data_only=True when you open the file. Also, openpyxl is not able to calculate a new value when you change the formula for a cell - only Excel itself can do that. So inserting a MATCH() worksheet function won't solve your problem.

下面的代码可以完成所需的工作,主要是使用Python.它使用"A1"引用样式,并进行了一些计算以将列号转换为列字母.如果您经过Z列,则无法很好地保持这种情况.在这种情况下,您可能希望切换到对行和列进行编号的引用.在此处

The code below does what you want, mostly in Python. It uses the "A1" reference style, and does some calculations to turn column numbers into column letters. This won't hold up well if you go past column Z. In that case, you may want to switch to numbered references to rows and columns. There's some more info on that here and here. But hopefully this will get you on your way.

注意:此代码假定您正在阅读名为"test.xlsx"的工作簿,并且"COGS"在"Sheet1!A2:A5"中的项目列表中,而2014在"Sheet1"中的年份列表中. !B1:E1'.

Note: This code assumes you are reading a workbook called 'test.xlsx', and that 'COGS' is in a list of items in 'Sheet1!A2:A5' and 2014 is in a list of years in 'Sheet1!B1:E1'.

import openpyxl

def get_xlsx_region(xlsx_file, sheet, region):
    """ Return a rectangular region from the specified file.
    The data are returned as a list of rows, where each row contains a list 
    of cell values"""

    # 'data_only=True' tells openpyxl to return values instead of formulas
    # 'read_only=True' makes openpyxl much faster (fast enough that it 
    # doesn't hurt to open the file once for each region).
    wb = openpyxl.load_workbook(xlsx_file, data_only=True, read_only=True)  

    reg = wb[sheet][region]

    return [[cell.value for cell in row] for row in reg]

# cache the lists of years and items
# get the first (only) row of the 'B1:F1' region
years = get_xlsx_region('test.xlsx', 'Sheet1', 'B1:E1')[0]
# get the first (only) column of the 'A2:A6' region
items = [r[0] for r in get_xlsx_region('test.xlsx', 'Sheet1', 'A2:A5')]

def find_correct_cell(year, item):
    # find the indexes for 'COGS' and 2014
    year_col = chr(ord('B') + years.index(year))   # only works in A:Z range
    item_row = 2 + items.index(item)

    cell_reference = year_col + str(item_row)

    return cell_reference

print find_correct_cell(year=2014, item='COGS')
# C3

这篇关于使用Python查找Excel单元格引用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆