OpenPyXL比较单元格 [英] Openpyxl compare cells

查看:90
本文介绍了OpenPyXL比较单元格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有2张纸,上面有一些数据(每条18k行),需要检查target.xlsx文件中是否存在source.xlsx的值.源文件中的行应该是唯一的.如果源文件中的单元格存在于目标文件中(在特定列中),则目标文件的下一列中需要填充源文件中某个列中的值.这非常棘手,因此示例如下:

I have 2 sheets with some data (18k rows each) and need to check if value from source.xlsx exists in a target.xlsx file. The rows in the source file should be unique. If the cell from source file exists in the target file (in specific column) then in next column in target file need to fill value from some column which is in the source file. It is quite tricky so example would look like:

target.xlsx

<table><tbody><tr><th>Data</th><th>price</th><th> </th></tr><tr><td>1234grt   </td><td> </td><td> </td></tr><tr><td>7686tyug  </td><td> </td><td> </td></tr><tr><td>9797tyu   </td><td>   </td><td> </td></tr><tr><td>9866yyy   </td><td> </td><td> </td></tr><tr><td>98845r  </td><td> </td><td> </td></tr><tr><td>4567yut  </td><td> </td><td> </td></tr><tr><td>1234grt</td><td> </td><td> </td></tr><tr><td>98845r </td><td> </td><td> </td></tr></tbody></table>

source.xls

<table><tbody><tr><th>Data</th><th>price</th><th> </th></tr><tr><td>98845r    </td><td>$50</td><td> </td></tr><tr><td>7686tyug  </td><td>$67</td><td> </td></tr><tr><td>9797tyu   </td><td>$56</td><td> </td></tr><tr><td>4567yut   </td><td>$67</td><td> </td></tr><tr><td>9866yyy   </td><td>$76</td><td> </td></tr><tr><td>98845r    </td><td>$56</td><td> </td></tr><tr><td>1234grt</td><td>$34</td><td> </td></tr></tbody></table>

for i in range(1, source_sheet_max_rows, 1):
print(i)
if source_wb[temp_sheet_name].cell(row=i, column=1).value in target_values:
    for j in range(1, target_sheet_max_rows, 1):
        if target_wb[temp_sheet_name].cell(row=j, column=1).value == source_wb[temp_sheet_name].cell(row=i,
                                                                                                           column=1).value:
            target_wb[temp_sheet_name].cell(row=j, column=2).value = source_wb[temp_sheet_name].cell(row=i,
                                                                                                             column=2).value
            target_wb.save(str(temp_sheet_name))

target_values-包含目标工作表中第1列的值

target_values - contains the values from col 1 in target sheet

上面的代码有效,但是非常繁琐,我认为有一些更好的方法可以做到.这些文件包含超过18k的行,因此比较数据需要花费一些时间.棘手的部分是,我需要知道源文件中我的单元格在目标文件中的哪一行中用相应的值填充列.我正在使用openpyxl,但如果更简单,我可以使用熊猫.

The above code works, but is very heavy and I think there is some better way do it. The files contain more than 18k rows so it would take ages to compare data. The tricky part is that I need to know in which row in the target file my cell from source file is to fill column with corresponding value. I am using openpyxl but if it is easier I could use pandas.

Thx

推荐答案

问题:检查target.xlsx文件中是否存在source.xlsx的值.

Question: check if value from source.xlsx exists in a target.xlsx file.

像下面的示例一样实现它:
文档: OpenPyXl-可以访问许多单元格
Python-映射类型—字典

Implement it like the following example:
Documentation: OpenPyXl - accessing-many-cells
Python - Mapping Types — dict, Python - object.__init__

class SourceSheet:
    def __init__(self, ws):
        self.ws = ws

    def __iter__(self):
        """
        Implement iterRows or iterRange
        :return: yield a tuple (value_to_search, value_to_fill)
        """
        # Example iterRange
        for row in range(1, self.ws.max_rows + 1):
            yield (self.ws.cell(row=row, column=1).value, self.ws.cell(row=row, column=2).value)

class TargetSheet:
    def __init__(self, ws):
        self.ws = ws

        """
        Create a 'dict' from all Values in Column A
        This allows Random Access the Cell Value to get the Cell Row Index
        Dict.key == Cell Value
        Dict.value = Cell Row Index
        _columnA = {} # {cell.value:cell.row}
        """
        self._columnA = dict(((c.value, c.row) for c in ws['A']))

    def find(self, value):
        """
        Implement either linear Search using iterRows over one Column or
                         search in dict to find 'value'
        :param value: The value to find
        :return: The Cell, to write the 'value_to_fill'
        """
        # Example using dict
        if value in self._columnA:
            return self.ws.cell(row=self._columnA[value], column=2)


sourceSheet = SourceSheet(ws1)
targetSheet = TargetSheet(ws2)        

for value_to_search, value_to_fill in sourceSheet:
    print("SourceSheet:{}".format((value_to_search, value_to_fill)))
    targetCell = targetSheet.find(value_to_search)

    if targetCell:
        print("Match: Write value '{}' to TargetSheet:'{}'".format(value_to_fill, targetCell))
        targetCell.value = value_to_fill
    else:
        print("Value '{}' not fount in TargetSheet!".format(value_to_search))

输出:

SourceSheet:('cell.A1.value', 'cell.B1.value')
Match: Write value 'cell.B1.value' to TargetSheet:'Cell.B1:'
SourceSheet:('cell.A2.value', 'cell.B2.value')
Match: Write value 'cell.B2.value' to TargetSheet:'Cell.B2:'
SourceSheet:('cell.A3.value', 'cell.B3.value')
Match: Write value 'cell.B3.value' to TargetSheet:'Cell.B3:'

使用Python测试:3.5

Tested with Python: 3.5

这篇关于OpenPyXL比较单元格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆