python读取大型txt文件的有效方法 [英] Efficient way of reading large txt file in python

查看：125 发布时间：2021/6/10 19:38:32 python python-3.x pandas numpy

本文介绍了python读取大型txt文件的有效方法的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试打开一个包含 4605227 行 (305 MB) 的 txt 文件

I'm trying to open a txt file with 4605227 rows (305 MB)

我之前的做法是:

data = np.loadtxt('file.txt', delimiter='\t', dtype=str, skiprows=1)

df = pd.DataFrame(data, columns=["a", "b", "c", "d", "e", "f", "g", "h", "i"])

df = df.astype(dtype={"a": "int64", "h": "int64", "i": "int64"})

但是它用完了大部分可用内存~10GB 并且没有完成.有没有更快的方法来读取这个 txt 文件并创建一个 Pandas 数据框?

But it's using up most of available ram ~10GB and not finishing. Is there a faster way of reading in this txt file and creating a pandas dataframe?

谢谢！

现已解决，谢谢.为什么 np.loadtxtx() 这么慢?

Solved now, thank you. Why is np.loadtxtx() so slow?

推荐答案

与其使用 numpy 读取它，不如直接将其作为 Pandas DataFrame 读取.例如，使用 pandas.read_csv 函数, 类似于:

Rather than reading it in with numpy you could just read it directly in as a Pandas DataFrame. E.g., using the pandas.read_csv function, with something like:

df = pd.read_csv('file.txt', delimiter='\t', usecols=["a", "b", "c", "d", "e", "f", "g", "h", "i"])

这篇关于python读取大型txt文件的有效方法的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

python读取大型txt文件的有效方法 [英] Efficient way of reading large txt file in python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

python读取大型txt文件的有效方法 [英] Efficient way of reading large txt file in python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭