python读取大型txt文件的有效方法 [英] Efficient way of reading large txt file in python
本文介绍了python读取大型txt文件的有效方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试打开一个包含 4605227 行 (305 MB) 的 txt 文件
I'm trying to open a txt file with 4605227 rows (305 MB)
我之前的做法是:
data = np.loadtxt('file.txt', delimiter='\t', dtype=str, skiprows=1)
df = pd.DataFrame(data, columns=["a", "b", "c", "d", "e", "f", "g", "h", "i"])
df = df.astype(dtype={"a": "int64", "h": "int64", "i": "int64"})
但是它用完了大部分可用内存~10GB 并且没有完成.有没有更快的方法来读取这个 txt 文件并创建一个 Pandas 数据框?
But it's using up most of available ram ~10GB and not finishing. Is there a faster way of reading in this txt file and creating a pandas dataframe?
谢谢!
现已解决,谢谢.为什么 np.loadtxtx() 这么慢?
Solved now, thank you. Why is np.loadtxtx() so slow?
推荐答案
与其使用 numpy 读取它,不如直接将其作为 Pandas DataFrame 读取.例如,使用 pandas.read_csv 函数, 类似于:
Rather than reading it in with numpy you could just read it directly in as a Pandas DataFrame. E.g., using the pandas.read_csv function, with something like:
df = pd.read_csv('file.txt', delimiter='\t', usecols=["a", "b", "c", "d", "e", "f", "g", "h", "i"])
这篇关于python读取大型txt文件的有效方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文