Python:使用Openpyxl阅读大型Excel Worksheets [英] Python : Reading Large Excel Worksheets using Openpyxl
问题描述
我创建了一个python脚本来做到这一点。然而,它迅速消耗所有可用的内存,几乎停止工作后25张出口。有没有人提出如何改进这个代码?
import openpyxl
import csv
import time
print(time.ctime())
importedfile = openpyxl.load_workbook(filename =C:/ Users / User / Desktop / Giant Workbook.xlsm,data_only = True,keep_vba = False)
tabnames = importedfile.get_sheet_names()
substring =关键字
为标签中的数字:
如果num.find(substring)> -1:
sheet = importedfile.get_sheet_by_name(num)
name =C:/ Users / User / Desktop / Test /+ num +.csv
with open(name, 'w',newline ='')as file:
savefile = csv.writer(file)
for it in sheet.rows:
savefile.writerow([cell.value for cell in i])
file.close()
print(time.ctime())
任何帮助将不胜感激。
感谢
编辑:我使用的是Windows 7和python 3.4.3。我也可以在R,VBA或SPSS中找到解决方案。
尝试使用 read_only = True
属性为 load_workbook()
类,这将导致您获得的工作表为 IterableWroksheet
,这意味着你只能遍历它们,你不能直接使用列/行号来访问它的单元格值。根据接近常量内存消耗 >文档。
此外,您不需要关闭文件
,与
语句将为您处理。
示例 -
import openpyxl
import csv
import time
print(time.ctime())
importedfile = openpyxl.load_workbook(filename =C:/ Users / User / Desktop / Giant Workbook.xlsm,read_only = True,keep_vba = False)
tabnames = importedfile.get_sheet_names )
substring =关键字
标签中的数字:
如果num.find(substring)> -1:
sheet = importedfile.get_sheet_by_name(num)
name =C:/ Users / User / Desktop / Test /+ num +.csv
with open(name, 'w',newline ='')as file:
savefile = csv.writer(file)
for it in sheet.rows:
savefile.writerow([cell.value for cell in i])
print(time.ctime())
从文档 -
有时,您需要打开或写入非常大的XLSX文件,并且openpyxl中的常见例程将无法处理该加载。幸运的是,有两种模式可以让您读取和写入无限量的数据(接近)恒定的内存消耗。
I have an Excel file containing about 400 worksheets, 375 of which I need to save out as CSV files. I've tried a VBA solution, but Excel has issues just opening this workbook.
I've created a python script to do just that. However, it rapidly consumes all available memory and pretty much stops working after 25 sheets are exported. Does anybody have a suggestion on how I might improve this code?
import openpyxl
import csv
import time
print(time.ctime())
importedfile = openpyxl.load_workbook(filename = "C:/Users/User/Desktop/Giant Workbook.xlsm", data_only = True, keep_vba = False)
tabnames = importedfile.get_sheet_names()
substring = "Keyword"
for num in tabnames:
if num.find(substring) > -1:
sheet=importedfile.get_sheet_by_name(num)
name = "C:/Users/User/Desktop/Test/" + num + ".csv"
with open(name, 'w', newline='') as file:
savefile = csv.writer(file)
for i in sheet.rows:
savefile.writerow([cell.value for cell in i])
file.close()
print(time.ctime())
Any help would be appreciated.
Thanks
EDIT: I'm using windows 7 and python 3.4.3. I'm also open to solutions in R, VBA, or SPSS.
Try using the read_only=True
property for load_workbook()
class, this causes the worksheets you get to be IterableWroksheet
, meaning you can only iterate over them, you cannot directly use column/row numbers to access cell values in it. This would provide near constant memory consumption
according to documentation .
Also, you do not need to close the file
, with
statement would handle that for you.
Example -
import openpyxl
import csv
import time
print(time.ctime())
importedfile = openpyxl.load_workbook(filename = "C:/Users/User/Desktop/Giant Workbook.xlsm", read_only = True, keep_vba = False)
tabnames = importedfile.get_sheet_names()
substring = "Keyword"
for num in tabnames:
if num.find(substring) > -1:
sheet=importedfile.get_sheet_by_name(num)
name = "C:/Users/User/Desktop/Test/" + num + ".csv"
with open(name, 'w', newline='') as file:
savefile = csv.writer(file)
for i in sheet.rows:
savefile.writerow([cell.value for cell in i])
print(time.ctime())
From Documentation -
Sometimes, you will need to open or write extremely large XLSX files, and the common routines in openpyxl won’t be able to handle that load. Fortunately, there are two modes that enable you to read and write unlimited amounts of data with (near) constant memory consumption.
这篇关于Python:使用Openpyxl阅读大型Excel Worksheets的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!