在文件中写入大量数据的最快方法 [英] Fastest way to write huge data in file

查看:166
本文介绍了在文件中写入大量数据的最快方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图创建一个随机的实数,整数,字母数字,字符串,然后写入文件,直到文件大小达到 10MB



 导入字符串
导入随机
导入时间
import sys

$ b class Generator():
def __init __(self):
self.generate_alphabetical_strings()
self.generate_integers()
self.generate_alphanumeric()
self.generate_real_numbers()

def generate_alphabetical_strings(self):
return''.join(random.choice(string.ascii_lowercase)for i (12))

def generate_integers(self):
return''.join(random.choice(string.digits)for i in range(12))

def generate_alphanumeric(self):
return''.join(random.choice(self.generate_alphabetical_strings()+
self.generate_integers())f或者我在范围(12))

def _insert_dot(self,string,index):
return string [:index] .__ add __('。').__ add __(string [index: ])


def generate_real_numbers(self):
rand_int_string =''.join(random.choice(self.generate_integers() b $ b返回self.insert_dot(rand_int_string,random.randint(0,11))


from time import_time
import os

a =生成器
$ bt = process_time()
inp = open(test.txt,w)
lt = 10 * 1000 * 1000
count = 0
while count <= lt:
inp.write(a.generate_alphanumeric())
count + = 39
inp.close()

elapsed_time = process_time() - t
print(elapsed_time)

> 225.953125秒即可完成。我怎样才能提高这个程序的速度?请提供一些代码见解?

解决方案

观察到缓慢的两个主要原因:


  • while循环很慢,它有大约一百万次的迭代次数。
  • 没有正确使用I / O缓冲。不要做这么多的系统调用。目前,您正在调用大约一百万次的 write()


    创建您的数据首先在Python数据结构中,并且只调用 write() 一次

    这是更快:

      t0 = time.time()
    open(bla.txt,wb ).write(''。join(random.choice(string.ascii_lowercase)for xrange(10 ** 7))
    d = time.time() - t0
    printduration: %.2f s。 %d

    输出:持续时间:7.30 s。

    现在程序大部分时间都在生成数据,即在 random 的东西里。你可以很容易地看到,用例如 random.choice(string.ascii_lowercase)代替。 。然后,在我的机器上测得的时间降到一秒以下。



    如果您想更接近地看到写入磁盘时机器的实际速度,请使用在写入磁盘之前,Python是最快速的(?)生成大数据的方式:

     >>> T0 =了time.time();块= 一个 * 10 ** 7; open(bla.txt,wb)。write(chunk); d =了time.time() -  T 0;打印时间:%.2f s。 %d 
    持续时间:0.02秒。


    I am trying to create a random real, integers, alphanumeric, alpha strings and then writing to a file till the file size reaches 10MB.

    The code is as follows.

    import string
    import random
    import time
    import sys
    
    
    class Generator():
        def __init__(self):
            self.generate_alphabetical_strings()
            self.generate_integers()
            self.generate_alphanumeric()
            self.generate_real_numbers()
    
        def generate_alphabetical_strings(self):
            return ''.join(random.choice(string.ascii_lowercase) for i in range(12))
    
        def generate_integers(self):
            return ''.join(random.choice(string.digits) for i in range(12))
    
        def generate_alphanumeric(self):
            return ''.join(random.choice(self.generate_alphabetical_strings() +
                                         self.generate_integers()) for i in range(12))
    
        def _insert_dot(self, string, index):
            return string[:index].__add__('.').__add__(string[index:])
    
    
        def generate_real_numbers(self):
            rand_int_string = ''.join(random.choice(self.generate_integers()) for i in range(12))
            return self._insert_dot(rand_int_string, random.randint(0, 11))
    
    
    from time import process_time
    import os
    
    a = Generator()
    
    t = process_time()
    inp = open("test.txt", "w")
    lt = 10 * 1000 * 1000
    count = 0
    while count <= lt:
        inp.write(a.generate_alphanumeric())
        count += 39
    inp.close()
    
    elapsed_time = process_time() - t
    print(elapsed_time)
    

    It takes around 225.953125 seconds to complete. How can i improve the speed of this program? Please provide some code insights?

    解决方案

    Two major reasons for observed "slowness":

    • your while loop is slow, it has about a million iterations.
    • You do not make proper use of I/O buffering. Do not make so many system calls. Currently, you are calling write() about one million times.

    Create your data in a Python data structure first and call write() only once.

    This is faster:

    t0 = time.time()
    open("bla.txt", "wb").write(''.join(random.choice(string.ascii_lowercase) for i in xrange(10**7)))
    d = time.time() - t0
    print "duration: %.2f s." % d
    

    Output: duration: 7.30 s.

    Now the program spends most of its time generating the data, i.e. in random stuff. You can easily see that by replacing random.choice(string.ascii_lowercase) with e.g. "a". Then the measured time drops to below one second on my machine.

    And if you want to get even closer to seeing how fast your machine really is when writing to disk, use Python's fastest (?) way to generate largish data before writing it to disk:

    >>> t0=time.time(); chunk="a"*10**7; open("bla.txt", "wb").write(chunk); d=time.time()-t0; print "duration: %.2f s." % d
    duration: 0.02 s.
    

    这篇关于在文件中写入大量数据的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆