如何提高读取文件程序的运行时性能 [英] How to improve runtime performance of reading file program

查看:59
本文介绍了如何提高读取文件程序的运行时性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在尝试使用Python读取1.5亿行(从具有生物测序信息的数据文件中).目前,它的读取速度为每秒20,000行,大约需要一个半小时.我必须通读其中的20个文件.鉴于Python是一种非常高级的语言,使用Java代替读取文件会更好,还是时差不足以保证切换到另一种语言?

I'm currently trying to read 150 million lines (from a data file with bio-sequencing information) using Python. Currently, it's reading at 20,000 lines per second which would take about an hour and a half. I have to read through 20 of these files. Given that Python is a very high level language, would it be better to use Java to read the files instead or is the time difference not significant enough to warrant switching to another language?

我正在使用的当前代码是:

The current code I'm using is:

lines_hashed = 0
with open(CUR_FILE) as f:
    for line in f:
       cpg = line.split("\t")
       cpg_dict[cpg[0]] = ....data....
       print lines_hashed
       lined_hashed += 1

print语句只是为了确保程序不会在任何地方停顿.我假设这也会减慢运行时间.没有print语句,有没有办法检查这一点?

The print statement is there only as a sanity that the program didn't stall anywhere. I'm assuming this is also slowing down the running time. Is there a way to check this without the print statement?

谢谢.

推荐答案

  1. 与磁盘读取相比,打印到屏幕的成本很高.如果必须在执行过程中检查性能,则每隔1000行或更多行仅打印一些内容.
  2. 对于使用其他语言,几乎所有语言都调用OS来完成实际工作.

这篇关于如何提高读取文件程序的运行时性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆