使用win32com在python中编写数组 [英] writing array to excel in python with win32com

查看:253
本文介绍了使用win32com在python中编写数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在制作一个解析数据文件的python。解析的数据然后发送到excel文件。
数据可能相当巨大。我正在查看10到20列,但是行数可以超过100.000。



使用win32com编写这个数量的excel需要大量的时间。我的第一步是在excel文件中迭代遍历单元格,这是非常耗时的。经过一番挖掘,我发现我可以用一个电话写一行,减少了大量的时间。



但是,当我需要发送100.000行数据擅长,仍然需要很多时间。我确信我可以进一步加快速度,通过一个通话发送完整的阵列。但是我到目前为止还没有达到这个目的。



代码显示了时间的差异。但是,第三步,将单个调用发送到范围的完整数组不会导致excel中的正确数据。我做错了什么?

  import win32com.client 
import time

#create数据数组
row = range(0,10)
i = 0
data_array = []
而i < 1000:
data_array.append(row)
i + = 1

将数组写入excel文件
excel = win32com.client.Dispatch(Excel应用程序)
excel.Visible = True
excel.ScreenUpdating = False
book = excel.Workbooks.Add()
sh1 = book.Worksheets(1)
sh2 = book.Worksheets(2)
sh3 = book.Worksheets(3)

#double循环,写单个单元格
打印用双循环写入单个单元格。
start = time.time()
row = 0
for data in data_array:
row + = 1
col = 0
for field in line :
col + = 1
sh1.Cells(row,col).Value = field
print处理时间:+ str(time.time() - 开始)+秒。

#single循环,将一行写入范围
print用双循环写入单个单元格。
start = time.time()
row = 0
data_array中的行:
row + = 1
sh2.Range(sh2.Cells(row,1 ),sh2.Cells(row,len(line)))。Value = line
print处理时间:+ str(time.time() - 开始)+秒。

#no loop,将数组写入范围
print用双循环写入单个单元格。
start = time.time()
try:
sh3.Range(sh3.Cells(row,1),sh3.Cells(len(data_array),len(data_array [0]) ))Value = data_array
print处理时间:+ str(time.time() - 开始)+秒。
除了:
print无法将数组写入excel文件。

excel.ScreenUpdating = True
sheet =无
book = None
excel.Quit()
excel =无


解决方案

我一直在做一些研究。并得到一些有趣的结论。



确实有许多解决方案要从python写入数据。我最终专注于三个模块。



win32com.client



。但是文档可以在excel中打开。所以最终的结果可以让用户在excel中开始工作。对于大量的数据来说不是很好。



在我的电脑(核心i5)应用程序上写入10行10行列。 70秒。

  import win32com.client 
import time

#create data array
row = range(0,10)
i = 0
data_array = []
而i < 10000:
data_array.append(row)
i + = 1

将数组写入excel文件
excel = win32com.client.Dispatch(Excel应用程序)
excel.Visible = True
excel.ScreenUpdating = False
book = excel.Workbooks.Add()
sheet = book.Worksheets(1)

#single循环,写一个范围
打印使用win32com.client写
start = time.time()
row = 0
for line在data_array中:
row + = 1
sheet.Range(sheet.Cells(row,1),sheet.Cells(row,len(line)))Value = line
print处理时间:+ str(time.time() - 开始)+秒。

打印已完成:+ str(time.time() - 开始)+秒。
excel.ScreenUpdating = True
sheet = None
book = None
excel.Quit()
excel =无

openpyxl



有点快,但还不是很好。该模块将数据传输到excel内存对象缓慢,但保存速度非常快。它在22.3秒内创建10列10万行,并将文件另存为0.5秒。
当我用100,000行10列进行测试时。数据创建时间为228.3秒,保存文件将在2.9秒内完成。相当缓慢,但文件保存速度很快。因此,openpyxl可能对于对现有数据进行更改(格式化)很有帮助,尽管我还没有测试。
另一个优点是使用openpyxl编码比使用win32com.client更容易。

  import openpyxl 
import sys
import time

#create data array
row = range(0,10)
i = 0
data_array = []
while i< 10000:
data_array.append(row)
i + = 1

#create一个excel工作簿和工作表对象
book = openpyxl.Workbook(optimized_write = True)

#single循环,写行
start = time.time()
打印使用openpyxl写单循环
sheet = book.create_sheet()
for data in data_array:
sheet.append(line)
print处理时间:+ str(time.time() - 开始)+秒。

#close并保存文件。
book.save('openpyxl.xlsx')
打印总时间:+ str(time.time() - 开始)+秒。

我正在面临openpyxl的另一个问题。在我真正的工具中,openpyxl在保存大量数据(> 10,000行)方面存在问题。这个我还没有想出来,也许我不会再打扰了。



PyExcelerate



这首先是快速的。它在0.17秒内创建了10,000行和10列。保存文件需要2.2秒。到目前为止,三者中最快的选择。
当我尝试使用此模块保存100,000行和10列时,Excel数据仅在1.8秒内创建。但是现在保存文件需要21.7秒。
因此,这个模块真的很快,但是在编写文件时会受到惩罚。总的来说,他们中最快的是。
PyExcelerate的另一个优点是编码非常简单,可以像openpyxl一样简单。

  import pyexcelerate 
导入sys
导入时间

#create数据数组
row = range(0,10)
i = 0
data_array = []
而i < 10000:
data_array.append(row)
i + = 1

打印sys.version

#create一个excel工作簿和工作表对象
book = pyexcelerate.Workbook()

#single循环,写行
start = time.time()
打印使用PyExcelerate编写单循环
book.new_sheet(data,data = data_array)

打印处理时间:+ str(time.time() - 开始)+秒。

#close并保存文件。
book.save('pyexcelerate.xlsx')
打印总时间:+ str(time.time() - 开始)+秒。

所以我的结论是PyExcelerate是迄今为止最快的。 win32com.client的优点是,创建的excel文件可以在excel中打开,以使创建的数据可供用户开始使用。创建后,Openpyxl可能会使样式发生变化。但是,我还没有测试过。
因此,将win32com.client,openpyxl和PyExcelerate组合在一个应用程序中可能是有益的。


I'm making a python which parses data files. The parsed data is then send to an excel file. The data can be rather huge. I'm looking at 10 to 20 columns, but the number of rows can be well over 100.000.

Writing this amount of data to excel with win32com takes a considerable amount of time. My first step was to iterate over ever cell in the excel file, which is very time consuming. After some digging I found out how I can write a row with a single call, reducing the time needed drasticly.

However, when I need to send 100.000 rows of data to excel, it will still take a lot of time. I'm pretty sure that I can further speed this up, by sending the complete array with a single call. However I'm not able to achieve this so far.

See the code below, which demonstrates the issue. The code shows the differences in time. However, the third step, sending the complete array with a single call to a range is not resulting in the correct data in excel. What am I doing wrong?

import win32com.client
import time

#create data array
row = range(0,10)
i = 0
data_array = []
while i < 1000:
    data_array.append(row)
    i += 1

#write the array to an excel file
excel = win32com.client.Dispatch("Excel.Application")
excel.Visible = True
excel.ScreenUpdating = False
book = excel.Workbooks.Add()
sh1 = book.Worksheets(1)
sh2 = book.Worksheets(2)
sh3 = book.Worksheets(3)

#double loop, writing individual cells
print "Writing with double loop to inidividual cells."
start = time.time()
row = 0
for line in data_array:
    row += 1
    col = 0
    for field in line:
        col += 1
        sh1.Cells(row, col).Value = field
print "Processing time: " + str(time.time() - start) + " seconds."

#single loop, writing a row to a range
print "Writing with double loop to inidividual cells."
start = time.time()
row = 0
for line in data_array:
    row += 1
    sh2.Range(sh2.Cells(row,1), sh2.Cells(row, len(line))).Value = line
print "Processing time: " + str(time.time() - start) + " seconds."

#no loop, write array to range
print "Writing with double loop to inidividual cells."
start = time.time()
try:
    sh3.Range(sh3.Cells(row,1), sh3.Cells(len(data_array), len(data_array[0]))).Value = data_array
    print "Processing time: " + str(time.time() - start) + " seconds."
except:
    print "Failed to write array to excel file."

excel.ScreenUpdating = True
sheet = None
book = None
excel.Quit()
excel = None

解决方案

I have been doing some research on this. And have been coming to some interesting conclusions.

There are indeed multiple solutions around to write data to excel from python. I ended up focussing on three modules.

win32com.client

Works slow. However the document can be opened in excel. So the end result is available for the user in excel to start working on. Not great for large amounts of data.

Writing 10,000 rows with 10 columns takes on my computer (core i5) app. 70 seconds.

import win32com.client
import time

#create data array
row = range(0,10)
i = 0
data_array = []
while i < 10000:
    data_array.append(row)
    i += 1

#write the array to an excel file
excel = win32com.client.Dispatch("Excel.Application")
excel.Visible = True
excel.ScreenUpdating = False
book = excel.Workbooks.Add()
sheet = book.Worksheets(1)

#single loop, writing a row to a range
print "Writing using win32com.client"
start = time.time()
row = 0
for line in data_array:
    row += 1
    sheet.Range(sheet.Cells(row,1), sheet.Cells(row, len(line))).Value = line
print "Processing time: " + str(time.time() - start) + " seconds."

print "Completed: " + str(time.time() - start) + " seconds."
excel.ScreenUpdating = True
sheet = None
book = None
excel.Quit()
excel = None

openpyxl

Is a bit faster, but still not great. This module is slow on transfering the data to the excel memory object, but save it really fast. It creates the 10,000 lines with 10 columns in 22.3 seconds, and saves the file in another 0.5 seconds. When I test this with 100,000 rows and 10 columns. The data is created in 228.3 seconds, and saving the file is done in another 2.9 seconds. Fairly slow, but the file saving is fast. Hence openpyxl is probably good for making changes (formatting) to existing data, although I have not testing this yet. Another advantage is that coding with openpyxl is easier then using win32com.client.

import openpyxl
import sys
import time

#create data array
row = range(0,10)
i = 0
data_array = []
while i < 10000:
    data_array.append(row)
    i += 1

#create an excel workbook and sheet object
book = openpyxl.Workbook(optimized_write = True)

#single loop, writing rows
start = time.time()
print "Writing with single loop using openpyxl"
sheet = book.create_sheet()
for line in data_array:
    sheet.append(line)
print "Processing time: " + str(time.time() - start) + " seconds."

#close and save the file.
book.save('openpyxl.xlsx')
print "Total time: " + str(time.time() - start) + " seconds."

I am facing another issue with openpyxl. In my real tool, openpyxl is having problems with saving big amounts of data (> 10,000 rows). This I have not figured out yet, and maybe I won't bother to look further in to it.

PyExcelerate

This one is first of all fast. It creates the 10,000 rows and 10 columns in just 0.17 seconds. Saving the file however takes 2.2 seconds. By far the fastest option of the three. When I try to save 100,000 rows and 10 columns with this module, the excel data is created in just 1.8 seconds. However saving the file now takes 21.7 seconds. Hence, this module is really fast, but there is a penalty in writing the file. Overall still the fastest of them all. Another advantage of PyExcelerate is that coding is very easy, easier again as openpyxl.

import pyexcelerate
import sys
import time

#create data array
row = range(0,10)
i = 0
data_array = []
while i < 10000:
    data_array.append(row)
    i += 1

print sys.version

#create an excel workbook and sheet object
book = pyexcelerate.Workbook()

#single loop, writing rows
start = time.time()
print "Writing with single loop using PyExcelerate"
book.new_sheet("data", data = data_array)

print "Processing time: " + str(time.time() - start) + " seconds."

#close and save the file.
book.save('pyexcelerate.xlsx')
print "Total time: " + str(time.time() - start) + " seconds."

So my conclusion is that PyExcelerate is by far the fastest. The advantage of win32com.client is that the created excel file can be openend in excel to make the created data available for the user to start working with it. Openpyxl could be interesting to make style changes after creating it. However, I have not tested this yet. Hence, it could be beneficial to combine win32com.client, openpyxl and PyExcelerate in one application.

这篇关于使用win32com在python中编写数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆