Python使读取Excel文件更快 [英] Python make reading a Excel file faster

查看:466
本文介绍了Python使读取Excel文件更快的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我制作了一个读取Excel文档的脚本,请检查第一行是否包含"UPDATED".如果是这样,它将用同一选项卡名称将整行写入另一个Excel文档.

I made a script that reads an Excel document en checks if the first row contains "UPDATED". If so it writes the whole row to another Excel document with the same Tab name.

我的Excel文档是23张纸,每张纸上有1000行,现在需要15分钟以上才能完成.有没有办法加快速度?

My Excel document is 23 sheets with 1000 lines on each sheet, and now it takes more than 15 minutes to complete this. Is there a way to speed this up?

我正在考虑多线程或多处理,但是我不知道哪个更好.

I was thinking about multithreading or multiprocessing but i don't know which one is better.

更新:我的程序花了15分钟才能运行的事实是由只读模式引起的,当我删除它时,只花了2秒即可运行该程序

UPDATE: the fact that my program took 15 minutes to run was al caused by the READ-ONLY mode, when i removed it, it only took 2 seconds to run the program

import openpyxl
import os
from datetime import datetime

titles = ["Column1", "Column2", "Column3", "Column4", "Column5","Column6", "Column7", "Column8", "Column9", "Column10", "Column11", "Column12", "Column13", "Column14", "Column15", "Column16"]


def main():
    oldFilePath= os.getcwd() + "\oldFile.xlsx"
    newFilePath= os.getcwd() + "\newFile.xlsx"

    wb = openpyxl.load_workbook(filename=oldFilePath, read_only=True)
    wb2 = openpyxl.Workbook()

    sheets = wb.get_sheet_names()
    sheets2 = wb2.get_sheet_names()

    #removes all sheets in newFile.xlsx
    for sheet in sheets2:
        temp = wb2.get_sheet_by_name(sheet)
        wb2.remove_sheet(temp)

    for tab in sheets:
        print("Sheet: " + str(tab))
        rowCounter = 2

        sheet = wb[tab]
        for row in range(sheet.max_row):
            if sheet.cell(row=row + 1, column=1).value == "": #if cell is empty stop reading
                break
            elif sheet.cell(row=row + 1, column=1).value == "UPDATED":
                if tab not in sheets2:
                    sheet2 = wb2.create_sheet(title=tab)
                    sheet2.append(titles)

                for x in range(1, 17):
                    sheet2.cell(row=rowCounter, column=x).value = sheet.cell(row=row + 1, column=x).value

                rowCounter += 1

                sheets2 = wb2.get_sheet_names()

    wb2.save(filename=newFilePath)


if __name__ == "__main__":
    startTime = datetime.now()
    main()
    print("Script finished in: " + str(datetime.now() - startTime))

推荐答案

对于这样的小型工作簿,无需使用只读模式,并且由于使用不当而造成了问题.每次调用ws.cell()都会迫使openpyxl再次解析工作表.

For such small workbooks there is no need to use read-only mode and by using it injudiciously you are causing the problem yourself. Every call to ws.cell() will force openpyxl to parse the worksheet again.

因此,您要么停止使用只读模式,要么按照我对上一个问题的建议使用ws.iter_rows().

So, either you stop using read-only mode, or use ws.iter_rows() as I advised on your previous question.

通常,如果您认为某件设备运行缓慢,则应始终对其进行概要分析,而不是尝试尝试并希望获得最好的结果.

In general, if you think something is running slow you should always profile it rather than just trying somethng out and hoping for the best.

这篇关于Python使读取Excel文件更快的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆