优化或加速从.xy文件读取到excel [英] Optimizing or speeding up reading from .xy files into excel
问题描述
我有几个.xy文件(2列x和y值)。我一直在尝试读取所有这些,并将y值粘贴到一个excel文件中(x值在所有这些文件中都是一样的)。我直到现在读取文件逐个读取文件,但速度非常慢(每个文件大约需要20秒)。我有相当多的.xy文件,时间相当大。我现在的代码是:
I have a few .xy files (2 columns with x and y values). I have been trying to read all of them and paste the "y" values into a single excel file (The "x" values are the same in all these files). The code I have till now reads the files one by one but its extremely slow (it takes about 20 seconds on each file). I have quite a few .xy files and the time adds up considerably. The code I have till now is:
import os,fnmatch,linecache,csv
from openpyxl import Workbook
wb = Workbook()
ws = wb.worksheets[0]
ws.title = "Sheet1"
def batch_processing(file_name):
row_count = sum(1 for row in csv.reader(open(file_name)))
try:
for row in xrange(1,row_count):
data = linecache.getline(file_name, row)
print data.strip().split()[1]
print data
ws.cell("A"+str(row)).value = float(data.strip().split()[0])
ws.cell("B"+str(row)).value = float(data.strip().split()[1])
print file_name
wb.save(filename = os.path.splitext(file_name)[0]+".xlsx")
except IndexError:
pass
workingdir = "C:\Users\Mine\Desktop\P22_PC"
os.chdir(workingdir)
for root, dirnames, filenames in os.walk(workingdir):
for file_name in fnmatch.filter(filenames, "*_Cs.xy"):
batch_processing(file_name)
任何帮助都不胜感激。谢谢。
Any help is appreciated. Thanks.
推荐答案
我认为您的主要问题是您正在写Excel并保存在文件中的每一行,对于目录中的每个单个文件。我不知道实际写值到Excel需要花费多长时间,但只需将保存
移出循环,只有一旦添加了所有内容才能保存一点时间。此外,这些文件有多大?如果它们很大,那么 linecache
可能是一个好主意,但是假设它们不是太大,那么你可能没有它。
I think your main issue is that you're writing to Excel and saving on every single line in the file, for every single file in the directory. I'm not sure of how long it takes to actually write the value to Excel, but just moving the save
out of the loop and saving only once everything has been added should cut a little time. Also, how large are these files? If they are massive, then linecache
may be a good idea, but assuming they aren't overly large then you can probably do without it.
def batch_processing(file_name):
# Using 'with' is a better way to open files - it ensures they are
# properly closed, etc. when you leave the code block
with open(filename, 'rb') as f:
reader = csv.reader(f)
# row_count = sum(1 for row in csv.reader(open(file_name)))
# ^^^You actually don't need to do this at all (though it is clever :)
# You are using it now to govern the loop, but the more Pythonic way is
# to do it as follows
for line_no, line in enumerate(reader):
# Split the line and create two variables that will hold val1 and val2
val1, val2 = line
print val1, val2 # You can also remove this - printing takes time too
ws.cell("A"+str(line_no+1)).value = float(val1)
ws.cell("B"+str(line_no+1)).value = float(val2)
# Doing this here will save the file after you process an entire file.
# You could save a bit more time and move this to after your walk statement -
# that way, you are only saving once after everything has completed
wb.save(filename = os.path.splitext(file_name)[0]+".xlsx")
这篇关于优化或加速从.xy文件读取到excel的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!