OpenPyXL比较单元格 [英] Openpyxl compare cells
问题描述
我有2张纸,上面有一些数据(每条18k行),需要检查target.xlsx文件中是否存在source.xlsx的值.源文件中的行应该是唯一的.如果源文件中的单元格存在于目标文件中(在特定列中),则目标文件的下一列中需要填充源文件中某个列中的值.这非常棘手,因此示例如下:
I have 2 sheets with some data (18k rows each) and need to check if value from source.xlsx exists in a target.xlsx file. The rows in the source file should be unique. If the cell from source file exists in the target file (in specific column) then in next column in target file need to fill value from some column which is in the source file. It is quite tricky so example would look like:
target.xlsx
<table><tbody><tr><th>Data</th><th>price</th><th> </th></tr><tr><td>1234grt </td><td> </td><td> </td></tr><tr><td>7686tyug </td><td> </td><td> </td></tr><tr><td>9797tyu </td><td> </td><td> </td></tr><tr><td>9866yyy </td><td> </td><td> </td></tr><tr><td>98845r </td><td> </td><td> </td></tr><tr><td>4567yut </td><td> </td><td> </td></tr><tr><td>1234grt</td><td> </td><td> </td></tr><tr><td>98845r </td><td> </td><td> </td></tr></tbody></table>
source.xls
<table><tbody><tr><th>Data</th><th>price</th><th> </th></tr><tr><td>98845r </td><td>$50</td><td> </td></tr><tr><td>7686tyug </td><td>$67</td><td> </td></tr><tr><td>9797tyu </td><td>$56</td><td> </td></tr><tr><td>4567yut </td><td>$67</td><td> </td></tr><tr><td>9866yyy </td><td>$76</td><td> </td></tr><tr><td>98845r </td><td>$56</td><td> </td></tr><tr><td>1234grt</td><td>$34</td><td> </td></tr></tbody></table>
for i in range(1, source_sheet_max_rows, 1):
print(i)
if source_wb[temp_sheet_name].cell(row=i, column=1).value in target_values:
for j in range(1, target_sheet_max_rows, 1):
if target_wb[temp_sheet_name].cell(row=j, column=1).value == source_wb[temp_sheet_name].cell(row=i,
column=1).value:
target_wb[temp_sheet_name].cell(row=j, column=2).value = source_wb[temp_sheet_name].cell(row=i,
column=2).value
target_wb.save(str(temp_sheet_name))
target_values-包含目标工作表中第1列的值
target_values - contains the values from col 1 in target sheet
上面的代码有效,但是非常繁琐,我认为有一些更好的方法可以做到.这些文件包含超过18k的行,因此比较数据需要花费一些时间.棘手的部分是,我需要知道源文件中我的单元格在目标文件中的哪一行中用相应的值填充列.我正在使用openpyxl,但如果更简单,我可以使用熊猫.
The above code works, but is very heavy and I think there is some better way do it. The files contain more than 18k rows so it would take ages to compare data. The tricky part is that I need to know in which row in the target file my cell from source file is to fill column with corresponding value. I am using openpyxl but if it is easier I could use pandas.
Thx
推荐答案
问题:检查target.xlsx文件中是否存在source.xlsx的值.
Question: check if value from source.xlsx exists in a target.xlsx file.
像下面的示例一样实现它:
文档: OpenPyXl-可以访问许多单元格
Python-映射类型—字典,
Implement it like the following example:
Documentation: OpenPyXl - accessing-many-cells
Python - Mapping Types — dict, Python - object.__init__
class SourceSheet:
def __init__(self, ws):
self.ws = ws
def __iter__(self):
"""
Implement iterRows or iterRange
:return: yield a tuple (value_to_search, value_to_fill)
"""
# Example iterRange
for row in range(1, self.ws.max_rows + 1):
yield (self.ws.cell(row=row, column=1).value, self.ws.cell(row=row, column=2).value)
class TargetSheet:
def __init__(self, ws):
self.ws = ws
"""
Create a 'dict' from all Values in Column A
This allows Random Access the Cell Value to get the Cell Row Index
Dict.key == Cell Value
Dict.value = Cell Row Index
_columnA = {} # {cell.value:cell.row}
"""
self._columnA = dict(((c.value, c.row) for c in ws['A']))
def find(self, value):
"""
Implement either linear Search using iterRows over one Column or
search in dict to find 'value'
:param value: The value to find
:return: The Cell, to write the 'value_to_fill'
"""
# Example using dict
if value in self._columnA:
return self.ws.cell(row=self._columnA[value], column=2)
sourceSheet = SourceSheet(ws1)
targetSheet = TargetSheet(ws2)
for value_to_search, value_to_fill in sourceSheet:
print("SourceSheet:{}".format((value_to_search, value_to_fill)))
targetCell = targetSheet.find(value_to_search)
if targetCell:
print("Match: Write value '{}' to TargetSheet:'{}'".format(value_to_fill, targetCell))
targetCell.value = value_to_fill
else:
print("Value '{}' not fount in TargetSheet!".format(value_to_search))
输出:
SourceSheet:('cell.A1.value', 'cell.B1.value')
Match: Write value 'cell.B1.value' to TargetSheet:'Cell.B1:'
SourceSheet:('cell.A2.value', 'cell.B2.value')
Match: Write value 'cell.B2.value' to TargetSheet:'Cell.B2:'
SourceSheet:('cell.A3.value', 'cell.B3.value')
Match: Write value 'cell.B3.value' to TargetSheet:'Cell.B3:'
使用Python测试:3.5
Tested with Python: 3.5
这篇关于OpenPyXL比较单元格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!