如何估算 pandas 的DataFrame需要多少内存? [英] How to estimate how much memory a Pandas' DataFrame will need?
问题描述
我一直在想...如果我正在将400MB的csv文件读取到熊猫数据帧中(使用read_csv或read_table),是否有任何方法可以估算出这将需要多少内存?只是试图更好地了解数据帧和内存...
I have been wondering... If I am reading, say, a 400MB csv file into a pandas dataframe (using read_csv or read_table), is there any way to guesstimate how much memory this will need? Just trying to get a better feel of data frames and memory...
推荐答案
df.memory_usage()
将返回每列占用的空间:
df.memory_usage()
will return how much each column occupies:
>>> df.memory_usage()
Row_ID 20906600
Household_ID 20906600
Vehicle 20906600
Calendar_Year 20906600
Model_Year 20906600
...
要包含索引,请传递index=True
.
因此要获得整体内存消耗:
So to get overall memory consumption:
>>> df.memory_usage(index=True).sum()
731731000
此外,传递deep=True
将启用更准确的内存使用情况报告,该报告说明了所包含对象的全部使用情况.
Also, passing deep=True
will enable a more accurate memory usage report, that accounts for the full usage of the contained objects.
这是因为如果使用deep=False
(默认情况),则内存使用量不包括不是数组组成部分的元素消耗的内存.
This is because memory usage does not include memory consumed by elements that are not components of the array if deep=False
(default case).
这篇关于如何估算 pandas 的DataFrame需要多少内存?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!