内存有效的方式来在 pandas 中存储bool和NaN值 [英] Memory efficient way to store bool and NaN values in pandas

查看：74 发布时间：2020/5/8 19:56:56 python python-3.x pandas memory nan

本文介绍了内存有效的方式来在 pandas 中存储bool和NaN值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在处理相当大的数据集(超过4 GB)，该数据集是在pandas中导入的.此数据集中的某些列是简单的True/False指示符，自然地，存储这些列的最节省内存的方法是为此列使用bool dtype.但是，该列还包含一些我要保留的NaN值.现在，这导致具有dtype float(具有值1.0，0.0和np.nan)或对象的列，但是它们都占用了太多的内存.

I am working with quite a large dataset (over 4 GB), which I imported in pandas. Quite some columns in this dataset are simple True/False indicators, and naturally the most memory-efficient way to store these would be using a bool dtype for this column. However, the column also contains some NaN values I want to preserve. Right now, this leads to the column having dtype float (with values 1.0, 0.0 and np.nan) or object, but they both use way too much memory.

例如:

df = pd.DataFrame([[True,True,True],[False,False,False], 
                   [np.nan,np.nan,np.nan]])
df[1] = df[1].astype(bool)
df[2] = df[2].astype(float)
print(df)
print(df.memory_usage(index=False, deep=True))
print(df.memory_usage(index=False, deep=False))

产生

       0      1    2
0   True   True  1.0
1  False  False  0.0
2    NaN   True  NaN

0       100
1         3
2        24
dtype: int64

0        24
1         3
2        24
dtype: int64

知道这些值只能采用3种不同类型的值:True，False和<undefined>

What would be the most efficient way to store these kinds of values, knowing they can only take on 3 different kinds of values: True, False and <undefined>

内存有效的方式来在 pandas 中存储bool和NaN值 [英] Memory efficient way to store bool and NaN values in pandas

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

内存有效的方式来在 pandas 中存储bool和NaN值 [英] Memory efficient way to store bool and NaN values in pandas

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭