在Python中遍历大文件需要花费数小时 [英] Looping through big files takes hours in Python

查看：131 发布时间：2020/11/16 0:47:54 python performance glob

本文介绍了在Python中遍历大文件需要花费数小时的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这是我第二天在Python中工作.我用C ++做了一段时间，但决定尝试使用Python.我的程序按预期工作.但是，当我一次处理一个文件而没有glob循环时，每个文件大约需要半小时.当我包含glob时，循环大约需要12个小时来处理8个文件.

This is my second day working in Python .I worked on this in C++ for a while, but decided to try Python. My program works as expected. However, when I process one file at a time without the glob loop, it takes about a half hour per file. When I include the glob, the loop takes about 12 hours to process 8 files.

我的问题是，我的程序中肯定有什么在减慢速度吗?我应该做些什么来使其更快?

My question is this, is there anything in my program that is definitely slowing it down? is there anything I should be doing to make it faster?

我有一个大文件文件夹.例如

I have a folder of large files. For example

file1.txt(6gb) file2.txt(5.5gb) file3.txt(6gb)

file1.txt (6gb) file2.txt (5.5gb) file3.txt (6gb)

如果有帮助，每一行数据都以一个字符开头，该字符告诉我其余字符的格式，这就是为什么我拥有所有if elif语句的原因.一行数据如下所示: T35201 M352 RZNGA AC

If it helps, each line of data begins with a character that tells me how the rest of the characters are formatted, which is why I have all of the if elif statements. A line of data would look like this: T35201 M352 RZNGA AC

我试图读取每个文件，使用拆分进行一些解析，然后保存文件.

I am trying to read each file, do some parsing using splits, and then save the file.

计算机具有32gb的ram，所以我的方法是将每个文件读入ram，然后循环浏览该文件，然后保存，清除ram以获取下一个文件.

The computer has 32gb of ram, so my method is to read each file into ram, and then loop through the file, and then save, clearing ram for the next file.

我已包含该文件，因此您可以看到我正在使用的方法.我使用if elif语句，该语句使用大约10个不同的elif命令.我已经尝试过字典，但是我想不起来要挽救我的性命.

I've included the file so you can see the methods that I am using. I use an if elif statement that uses about 10 different elif commands. I have tried a dictionary, but I couldn't figure that out to save my life.

任何答案都是有帮助的.

Any answers would be helpful.

import csv
import glob

for filename in glob.glob("/media/3tb/5may/*.txt"):
    f = open(filename,'r')
    c = csv.writer(open(filename + '.csv','wb'))

    second=0
    mill=0
    for line in f.readlines():
       #print line
        event=0
        ticker=0
        marketCategory=0
        variable = line[0:1]    

        if variable is 'T':
           second = line[1:6]
           mill=0
        else: 
           second = second 

        if variable is 'R':
           ticker = line[1:7]   
           marketCategory = line[7:8]
        elif variable is ...
        elif variable is ...
        elif ...
        elif ...
        elif ...
        elif ...
        elif        

        if variable (!= 'T') and (!= 'M')
            c.writerow([second,mill,event ....]) 
   f.close()

更新每个elif语句几乎相同.唯一改变的部分是我划分线的方式.这是两个elif语句(共有13条语句，除了拆分方式外，它们几乎都是相同的.)

UPDATE Each of the elif statements are nearly identical. The only parts that change are the ways that I split the lines. Here are two elif statements (There are 13 total, and they are almost all identical except for the way that they are split.)

  elif variable is 'C':
     order = line[1:10]
     Shares = line[10:16]
     match = line[16:25]
     printable = line[25:26]
     price = line[26:36]
   elif variable is 'P':
     ticker = line[17:23]
     order = line[1:10]
     buy = line[10:11]
     shares = line[11:17]
     price = line[23:33]
     match = line[33:42]

UPDATE2 我已经使用for file in f两次运行了代码.我第一次运行单个文件而没有 for filename in glob.glob("/media/3tb/file.txt"):时，花了大约30分钟的时间来手动编码一个文件的文件路径.

UPDATE2 I have ran the code using for file in f two different times. The first time I ran a single file without for filename in glob.glob("/media/3tb/file.txt"): and it took about 30 minutes manually coding the file path for one file.

我再次使用 for filename in glob.glob("/media/3tb/*file.txt")运行了该文件，只花了一个小时的时间就找到了该文件夹中的一个文件.全局代码会增加这么多时间吗?

I ran it again with for filename in glob.glob("/media/3tb/*file.txt") and it took an hour just for one file in the folder. Does the glob code add that much time?

在Python中遍历大文件需要花费数小时 [英] Looping through big files takes hours in Python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在Python中遍历大文件需要花费数小时 [英] Looping through big files takes hours in Python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭