numpy.loadtxt比打开速度慢..... readlines() [英] numpy.loadtxt is way slower than open.....readlines()

查看:1025
本文介绍了numpy.loadtxt比打开速度慢..... readlines()的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当比较这两种做同一件事的方式时:

when comparing this two ways of doing the same thing:

import numpy as np
import time
start_time = time.time()
for j in range(1000):
    bv=np.loadtxt('file%d.dat' % (j+1))
    if(j%100==0):   
        print bv[300,0] 
T1=time.time() - start_time
print("--- %s seconds ---" % T1)

import numpy as np
import time
start_time = time.time()
for j in range(1000):
    a=open('file%d.dat' % (j+1),'r')
    b=a.readlines()
    a.close()
    for i in range(len(b)):
        b[i]=b[i].strip("\n")
        b[i]=b[i].split("\t")
        b[i]=map(float,b[i])
    bv=np.asarray(b)
    if(j%100==0):   
        print bv[300,0]  
T1=time.time() - start_time
print("--- %s seconds ---" % T1)

我注意到第二个要快得多.有什么方法可以像第一种方法一样简洁,第二种方法一样快捷? 为什么loadtxt在手动执行相同任务方面如此缓慢?

I have noticed that the second one is way faster. Is there any way to have something as concise as the first method and as fast as the second one? Why is loadtxt so slow with respect to performing the same task manually?

推荐答案

使用创建的简单但不太大的csv:

With a simple, not too large csv created with:

In [898]: arr = np.ones((1000,100))
In [899]: np.savetxt('float.csv',arr)

loadtxt版本:

the loadtxt version:

In [900]: timeit data = np.loadtxt('float.csv')
112 ms ± 119 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

fromfile可以加载文本,尽管它不保留任何形状信息(没有明显的速度优势)

fromfile can load text, though it doesn't preserve any shape info (no apparent speed advantage)

In [901]: timeit data = np.fromfile('float.csv', dtype=float, sep=' ').reshape(-1,100)
129 ms ± 1.27 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

我能想到的最简明的手册"版本:

the most concise version of the 'manual' that I can come up with:

In [902]: %%timeit
     ...: with open('float.csv') as f:
     ...:     data = np.array([line.strip().split() for line in f],float)
52.9 ms ± 589 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

相对于loadtxt的2倍改进似乎是这种变化的典型表现.

This 2x improvement over loadtxt seems typical of variations on this.

pd.read_csv大约是同一时间.

genfromtxtloadtxt快一点:

In [907]: timeit data = np.genfromtxt('float.csv')
98.2 ms ± 4.7 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

这篇关于numpy.loadtxt比打开速度慢..... readlines()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆