Python:使用Openpyxl阅读大型Excel Worksheets [英] Python : Reading Large Excel Worksheets using Openpyxl

查看:512
本文介绍了Python:使用Openpyxl阅读大型Excel Worksheets的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含大约400个工作表的Excel文件,其中375个需要保存为CSV文件。我已经尝试了一个VBA解决方案,但Excel只是打开这个工作簿的问题。



我创建了一个python脚本来做到这一点。然而,它迅速消耗所有可用的内存,几乎停止工作后25张出口。有没有人提出如何改进这个代码?

  import openpyxl 

import csv

import time

print(time.ctime())

importedfile = openpyxl.load_workbook(filename =C:/ Users / User / Desktop / Giant Workbook.xlsm,data_only = True,keep_vba ​​= False)

tabnames = importedfile.get_sheet_names()

substring =关键字

为标签中的数字:

如果num.find(substring)> -1:
sheet = importedfile.get_sheet_by_name(num)
name =C:/ Users / User / Desktop / Test /+ num +.csv
with open(name, 'w',newline ='')as file:
savefile = csv.writer(file)
for it in sheet.rows:
savefile.writerow([cell.value for cell in i])
file.close()
print(time.ctime())

任何帮助将不胜感激。



感谢



编辑:我使用的是Windows 7和python 3.4.3。我也可以在R,VBA或SPSS中找到解决方案。

解决方案

尝试使用 read_only = True 属性为 load_workbook()类,这将导致您获得的工作表为 IterableWroksheet ,这意味着你只能遍历它们,你不能直接使用列/行号来访问它的单元格值。根据接近常量内存消耗 >文档



此外,您不需要关闭文件语句将为您处理。



示例 -

  import openpyxl 

import csv

import time

print(time.ctime())

importedfile = openpyxl.load_workbook(filename =C:/ Users / User / Desktop / Giant Workbook.xlsm,read_only = True,keep_vba ​​= False)

tabnames = importedfile.get_sheet_names )

substring =关键字

标签中的数字:

如果num.find(substring)> -1:
sheet = importedfile.get_sheet_by_name(num)
name =C:/ Users / User / Desktop / Test /+ num +.csv
with open(name, 'w',newline ='')as file:
savefile = csv.writer(file)
for it in sheet.rows:
savefile.writerow([cell.value for cell in i])
print(time.ctime())

文档 -


有时,您需要打开或写入非常大的XLSX文件,并且openpyxl中的常见例程将无法处理该加载。幸运的是,有两种模式可以让您读取和写入无限量的数据(接近)恒定的内存消耗。



I have an Excel file containing about 400 worksheets, 375 of which I need to save out as CSV files. I've tried a VBA solution, but Excel has issues just opening this workbook.

I've created a python script to do just that. However, it rapidly consumes all available memory and pretty much stops working after 25 sheets are exported. Does anybody have a suggestion on how I might improve this code?

import openpyxl

import csv

import time

print(time.ctime())

importedfile = openpyxl.load_workbook(filename = "C:/Users/User/Desktop/Giant Workbook.xlsm", data_only = True, keep_vba = False)

tabnames = importedfile.get_sheet_names()

substring = "Keyword"

for num in tabnames:

    if num.find(substring) > -1:
        sheet=importedfile.get_sheet_by_name(num)        
        name = "C:/Users/User/Desktop/Test/" + num + ".csv"
        with open(name, 'w', newline='') as file:
            savefile = csv.writer(file)
            for i in sheet.rows:
                savefile.writerow([cell.value for cell in i])
        file.close()
print(time.ctime())

Any help would be appreciated.

Thanks

EDIT: I'm using windows 7 and python 3.4.3. I'm also open to solutions in R, VBA, or SPSS.

解决方案

Try using the read_only=True property for load_workbook() class, this causes the worksheets you get to be IterableWroksheet , meaning you can only iterate over them, you cannot directly use column/row numbers to access cell values in it. This would provide near constant memory consumption according to documentation .

Also, you do not need to close the file, with statement would handle that for you.

Example -

import openpyxl

import csv

import time

print(time.ctime())

importedfile = openpyxl.load_workbook(filename = "C:/Users/User/Desktop/Giant Workbook.xlsm", read_only = True, keep_vba = False)

tabnames = importedfile.get_sheet_names()

substring = "Keyword"

for num in tabnames:

    if num.find(substring) > -1:
        sheet=importedfile.get_sheet_by_name(num)        
        name = "C:/Users/User/Desktop/Test/" + num + ".csv"
        with open(name, 'w', newline='') as file:
            savefile = csv.writer(file)
            for i in sheet.rows:
                savefile.writerow([cell.value for cell in i])
print(time.ctime())

From Documentation -

Sometimes, you will need to open or write extremely large XLSX files, and the common routines in openpyxl won’t be able to handle that load. Fortunately, there are two modes that enable you to read and write unlimited amounts of data with (near) constant memory consumption.

这篇关于Python:使用Openpyxl阅读大型Excel Worksheets的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆