用Python读取输入的最快方法 [英] The fastest way to read input in Python

查看:90
本文介绍了用Python读取输入的最快方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想阅读一个巨大的文本文件,其中包含整数列表. 现在,我正在执行以下操作:

I want to read a huge text file that contains list of lists of integers. Now I'm doing the following:

G = []
with open("test.txt", 'r') as f:
    for line in f:
        G.append(list(map(int,line.split())))

但是,大约需要17秒(通过timeit).有什么办法可以减少这个时间?也许有一种不使用地图的方法.

However, it takes about 17 secs (via timeit). Is there any way to reduce this time? Maybe, there is a way not to use map.

推荐答案

numpy具有函数loadtxtgenfromtxt,但是它们都不是特别快.分布广泛的库中可用的最快的文本阅读器之一是pandas中的read_csv函数( http://pandas .pydata.org/).在我的计算机上,使用numpy.loadtxt读取500万行包含每行两个整数的行大约需要46秒,使用numpy.genfromtxt大约需要26秒,使用pandas.read_csv大约需要1秒.

numpy has the functions loadtxt and genfromtxt, but neither is particularly fast. One of the fastest text readers available in a widely distributed library is the read_csv function in pandas (http://pandas.pydata.org/). On my computer, reading 5 million lines containing two integers per line takes about 46 seconds with numpy.loadtxt, 26 seconds with numpy.genfromtxt, and a little over 1 second with pandas.read_csv.

这是显示结果的会话. (这是在Linux,Ubuntu 12.04 64位上.您在这里看不到它,但是在每次读取文件后,通过在单独的shell中运行sync; echo 3 > /proc/sys/vm/drop_caches清除了磁盘缓存.)

Here's the session showing the result. (This is on Linux, Ubuntu 12.04 64 bit. You can't see it here, but after each reading of the file, the disk cache was cleared by running sync; echo 3 > /proc/sys/vm/drop_caches in a separate shell.)

In [1]: import pandas as pd

In [2]: %timeit -n1 -r1 loadtxt('junk.dat')
1 loops, best of 1: 46.4 s per loop

In [3]: %timeit -n1 -r1 genfromtxt('junk.dat')
1 loops, best of 1: 26 s per loop

In [4]: %timeit -n1 -r1 pd.read_csv('junk.dat', sep=' ', header=None)
1 loops, best of 1: 1.12 s per loop

这篇关于用Python读取输入的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆