优化或加速从.xy文件读取到excel [英] Optimizing or speeding up reading from .xy files into excel

查看:245
本文介绍了优化或加速从.xy文件读取到excel的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有几个.xy文件(2列x和y值)。我一直在尝试读取所有这些,并将y值粘贴到一个excel文件中(x值在所有这些文件中都是一样的)。我直到现在读取文件逐个读取文件,但速度非常慢(每个文件大约需要20秒)。我有相当多的.xy文件,时间相当大。我现在的代码是:

I have a few .xy files (2 columns with x and y values). I have been trying to read all of them and paste the "y" values into a single excel file (The "x" values are the same in all these files). The code I have till now reads the files one by one but its extremely slow (it takes about 20 seconds on each file). I have quite a few .xy files and the time adds up considerably. The code I have till now is:

import os,fnmatch,linecache,csv
from openpyxl import Workbook

wb = Workbook() 
ws = wb.worksheets[0]
ws.title = "Sheet1"


def batch_processing(file_name):
    row_count = sum(1 for row in csv.reader(open(file_name)))
    try:
        for row in xrange(1,row_count):

            data = linecache.getline(file_name, row)
            print data.strip().split()[1]   
            print data
            ws.cell("A"+str(row)).value = float(data.strip().split()[0])
            ws.cell("B"+str(row)).value = float(data.strip().split()[1])

        print file_name
        wb.save(filename = os.path.splitext(file_name)[0]+".xlsx")
    except IndexError:
        pass


workingdir = "C:\Users\Mine\Desktop\P22_PC"
os.chdir(workingdir)
for root, dirnames, filenames in os.walk(workingdir):
    for file_name in fnmatch.filter(filenames, "*_Cs.xy"):
        batch_processing(file_name)

任何帮助都不胜感激。谢谢。

Any help is appreciated. Thanks.

推荐答案

我认为您的主要问题是您正在写Excel并保存在文件中的每一行,对于目录中的每个单个文件。我不知道实际写值到Excel需要花费多长时间,但只需将保存移出循环,只有一旦添加了所有内容才能保存一点时间。此外,这些文件有多大?如果它们很大,那么 linecache 可能是一个好主意,但是假设它们不是太大,那么你可能没有它。

I think your main issue is that you're writing to Excel and saving on every single line in the file, for every single file in the directory. I'm not sure of how long it takes to actually write the value to Excel, but just moving the save out of the loop and saving only once everything has been added should cut a little time. Also, how large are these files? If they are massive, then linecache may be a good idea, but assuming they aren't overly large then you can probably do without it.

def batch_processing(file_name):

    # Using 'with' is a better way to open files - it ensures they are
    # properly closed, etc. when you leave the code block
    with open(filename, 'rb') as f:
        reader = csv.reader(f)
        # row_count = sum(1 for row in csv.reader(open(file_name)))
        # ^^^You actually don't need to do this at all (though it is clever :)
        # You are using it now to govern the loop, but the more Pythonic way is
        # to do it as follows
        for line_no, line in enumerate(reader):
            # Split the line and create two variables that will hold val1 and val2
            val1, val2 = line
            print val1, val2 # You can also remove this - printing takes time too
            ws.cell("A"+str(line_no+1)).value = float(val1)
            ws.cell("B"+str(line_no+1)).value = float(val2)

    # Doing this here will save the file after you process an entire file.
    # You could save a bit more time and move this to after your walk statement - 
    # that way, you are only saving once after everything has completed
    wb.save(filename = os.path.splitext(file_name)[0]+".xlsx")

这篇关于优化或加速从.xy文件读取到excel的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆