python读取大型txt文件的有效方法 [英] Efficient way of reading large txt file in python

查看:125
本文介绍了python读取大型txt文件的有效方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试打开一个包含 4605227 行 (305 MB) 的 txt 文件

I'm trying to open a txt file with 4605227 rows (305 MB)

我之前的做法是:

data = np.loadtxt('file.txt', delimiter='\t', dtype=str, skiprows=1)

df = pd.DataFrame(data, columns=["a", "b", "c", "d", "e", "f", "g", "h", "i"])

df = df.astype(dtype={"a": "int64", "h": "int64", "i": "int64"})

但是它用完了大部分可用内存~10GB 并且没有完成.有没有更快的方法来读取这个 txt 文件并创建一个 Pandas 数据框?

But it's using up most of available ram ~10GB and not finishing. Is there a faster way of reading in this txt file and creating a pandas dataframe?

谢谢!

现已解决,谢谢.为什么 np.loadtxtx() 这么慢?

Solved now, thank you. Why is np.loadtxtx() so slow?

推荐答案

与其使用 numpy 读取它,不如直接将其作为 Pandas DataFrame 读取.例如,使用 pandas.read_csv 函数, 类似于:

Rather than reading it in with numpy you could just read it directly in as a Pandas DataFrame. E.g., using the pandas.read_csv function, with something like:

df = pd.read_csv('file.txt', delimiter='\t', usecols=["a", "b", "c", "d", "e", "f", "g", "h", "i"])

这篇关于python读取大型txt文件的有效方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆