使用python从文件中读取行 [英] Reading lines from a file using python

查看:29
本文介绍了使用python从文件中读取行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个将近 100000 行的文件.我想做一个清理过程(小写,删除停用词等)但是需要时间.

I have a file with almost 100000 lines. I want to make a cleanning process (lower case, remove stopwords etc) However it takes time.

以 10000 为例,脚本需要 15 分钟.对于所有文件,我预计需要 150 分钟.但是需要5个小时.

Example for 10000 the script needs 15 minutes. For all file I expect to take 150 minutes. However it takes 5 hours.

在启动文件时使用:

fileinput = open('tweets.txt', 'r')

lines = fileinput.read().lower() #for lower case, however it load all file

for line in fileinput:
    lines = line.lower() 

问题:我可以使用一种方法来读取前 10000 行进行清理的过程,然后再阅读下一行博客等吗?

Question: Can I use a way to read the first 10000 lines making the process of cleaning and after that reading the next blog of lines etc?

推荐答案

我强烈建议逐行操作,而不是一次读取整个文件(换句话说,不要使用 .read()).

I would highly suggest operating line-by-line instead of reading in the entire file all at once (in other words, don't use .read()).

with open('tweets.txt', 'r') as fileinput:
    for line in fileinput:
        line = line.lower()
        # ... do something with line ...
        # (for example, write the line to a new file, or print it)

将自动利用 Python 的内置缓冲功能.

这篇关于使用python从文件中读取行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆