Pandas 中的大型持久数据帧 [英] Large, persistent DataFrame in pandas

查看：42 发布时间：2021/12/3 8:52:23 python pandas sas

本文介绍了Pandas 中的大型持久数据帧的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

作为 SAS 的长期用户，我正在探索改用 Python 和 Pandas.

I am exploring switching to python and pandas as a long-time SAS user.

然而，今天运行一些测试时，我很惊讶 python 在尝试 pandas.read_csv() 一个 128mb csv 文件时内存不足.它有大约 200,000 行和 200 列，主要是数字数据.

However, when running some tests today, I was surprised that python ran out of memory when trying to pandas.read_csv() a 128mb csv file. It had about 200,000 rows and 200 columns of mostly numeric data.

使用 SAS，我可以将 csv 文件导入到 SAS 数据集，它可以和我的硬盘一样大.

With SAS, I can import a csv file into a SAS dataset and it can be as large as my hard drive.

pandas 中是否有类似的东西?

Is there something analogous in pandas?

我经常处理大文件，但无法访问分布式计算网络.

I regularly work with large files and do not have access to a distributed computing network.

推荐答案

原则上应该不会内存不足，但是目前read_csv在大文件上存在一些复杂的内存问题Python 内部问题(这个含糊不清，但早就知道了:http://github.com/pydata/pandas/问题/407).

In principle it shouldn't run out of memory, but there are currently memory problems with read_csv on large files caused by some complex Python internal issues (this is vague but it's been known for a long time: http://github.com/pydata/pandas/issues/407).

目前还没有完美的解决方案(这是一个乏味的解决方案:您可以将文件逐行转录为预先分配的 NumPy 数组或内存映射文件--np.mmap)，但这是我将在不久的将来进行的工作.另一种解决方案是读取更小的文件(使用 iterator=True, chunksize=1000)，然后与 pd.concat 连接.当您一口气将整个文本文件拉入内存时，问题就出现了.

At the moment there isn't a perfect solution (here's a tedious one: you could transcribe the file row-by-row into a pre-allocated NumPy array or memory-mapped file--np.mmap), but it's one I'll be working on in the near future. Another solution is to read the file in smaller pieces (use iterator=True, chunksize=1000) then concatenate then with pd.concat. The problem comes in when you pull the entire text file into memory in one big slurp.

这篇关于Pandas 中的大型持久数据帧的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Pandas 中的大型持久数据帧 [英] Large, persistent DataFrame in pandas

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Pandas 中的大型持久数据帧 [英] Large, persistent DataFrame in pandas

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭