在文件中写入大量数据的最快方法 [英] Fastest way to write huge data in file
问题描述
我试图创建一个随机的实数,整数,字母数字,字符串,然后写入文件,直到文件大小达到 10MB 。
导入字符串
导入随机
导入时间
import sys
$ b class Generator():
def __init __(self):
self.generate_alphabetical_strings()
self.generate_integers()
self.generate_alphanumeric()
self.generate_real_numbers()
def generate_alphabetical_strings(self):
return''.join(random.choice(string.ascii_lowercase)for i (12))
def generate_integers(self):
return''.join(random.choice(string.digits)for i in range(12))
def generate_alphanumeric(self):
return''.join(random.choice(self.generate_alphabetical_strings()+
self.generate_integers())f或者我在范围(12))
def _insert_dot(self,string,index):
return string [:index] .__ add __('。').__ add __(string [index: ])
def generate_real_numbers(self):
rand_int_string =''.join(random.choice(self.generate_integers() b $ b返回self.insert_dot(rand_int_string,random.randint(0,11))
from time import_time
import os
a =生成器
$ bt = process_time()
inp = open(test.txt,w)
lt = 10 * 1000 * 1000
count = 0
while count <= lt:
inp.write(a.generate_alphanumeric())
count + = 39
inp.close()
elapsed_time = process_time() - t
print(elapsed_time)
> 225.953125秒即可完成。我怎样才能提高这个程序的速度?请提供一些代码见解?
观察到缓慢的两个主要原因: 创建您的数据首先在Python数据结构中,并且只调用 这是更快: 输出: 如果您想更接近地看到写入磁盘时机器的实际速度,请使用在写入磁盘之前,Python是最快速的(?)生成大数据的方式: I am trying to create a random real, integers, alphanumeric, alpha strings and then writing to a file till the file size reaches 10MB. The code is as follows. It takes around 225.953125 seconds to complete. How can i improve the speed of this program? Please provide some code insights? Two major reasons for observed "slowness": Create your data in a Python data structure first and call This is faster: Output: Now the program spends most of its time generating the data, i.e. in And if you want to get even closer to seeing how fast your machine really is when writing to disk, use Python's fastest (?) way to generate largish data before writing it to disk:
这篇关于在文件中写入大量数据的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
write()
。
write()
一次。
t0 = time.time()
open(bla.txt,wb ).write(''。join(random.choice(string.ascii_lowercase)for xrange(10 ** 7))
d = time.time() - t0
printduration: %.2f s。 %d
持续时间:7.30 s。
现在程序大部分时间都在生成数据,即在 random
的东西里。你可以很容易地看到,用例如 random.choice(string.ascii_lowercase)
代替。 一
。然后,在我的机器上测得的时间降到一秒以下。
>>> T0 =了time.time();块= 一个 * 10 ** 7; open(bla.txt,wb)。write(chunk); d =了time.time() - T 0;打印时间:%.2f s。 %d
持续时间:0.02秒。
import string
import random
import time
import sys
class Generator():
def __init__(self):
self.generate_alphabetical_strings()
self.generate_integers()
self.generate_alphanumeric()
self.generate_real_numbers()
def generate_alphabetical_strings(self):
return ''.join(random.choice(string.ascii_lowercase) for i in range(12))
def generate_integers(self):
return ''.join(random.choice(string.digits) for i in range(12))
def generate_alphanumeric(self):
return ''.join(random.choice(self.generate_alphabetical_strings() +
self.generate_integers()) for i in range(12))
def _insert_dot(self, string, index):
return string[:index].__add__('.').__add__(string[index:])
def generate_real_numbers(self):
rand_int_string = ''.join(random.choice(self.generate_integers()) for i in range(12))
return self._insert_dot(rand_int_string, random.randint(0, 11))
from time import process_time
import os
a = Generator()
t = process_time()
inp = open("test.txt", "w")
lt = 10 * 1000 * 1000
count = 0
while count <= lt:
inp.write(a.generate_alphanumeric())
count += 39
inp.close()
elapsed_time = process_time() - t
print(elapsed_time)
write()
about one million times.write()
only once.t0 = time.time()
open("bla.txt", "wb").write(''.join(random.choice(string.ascii_lowercase) for i in xrange(10**7)))
d = time.time() - t0
print "duration: %.2f s." % d
duration: 7.30 s.
random
stuff. You can easily see that by replacing random.choice(string.ascii_lowercase)
with e.g. "a"
. Then the measured time drops to below one second on my machine.>>> t0=time.time(); chunk="a"*10**7; open("bla.txt", "wb").write(chunk); d=time.time()-t0; print "duration: %.2f s." % d
duration: 0.02 s.