在大 pandas 数据框中替换NaN值时遇到Python内存错误 [英] Python Memory Error encountered when replacing NaN values in large Pandas dataframe

查看：50 发布时间：2020/10/17 2:01:09 python pandas memory dataframe

本文介绍了在大 pandas 数据框中替换NaN值时遇到Python内存错误的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个很大的熊猫数据框：〜300,000列和〜17,520行。大熊猫数据框称为 result_full 。我试图将所有字符串 NaN 替换为 numpy.nan ：

I have a very large pandas dataframe: ~300,000 columns and ~17,520 rows. The pandas dataframe is called result_full. I am attempting to replace all of the strings "NaN" with numpy.nan:

result_full.replace(["NaN"], np.nan, inplace = True)

在这里我得到 MemoryError 有没有一种有效的内存方式来将这些字符串放在数据框中？我尝试了 result_full.dropna（），但是它没有用，因为从技术上讲它们是字符串 NaN

Here is where I get MemoryError Is there a memory efficient way to drop these strings in my dataframe? I tried result_full.dropna() but it didn't work because they are technically string that are "NaN"

推荐答案

问题之一可能是由于使用32位计算机，因为它一次最多可以处理2GB的数据。如果可能的话，可以扩展到64位计算机，以避免将来出现问题。

One of the issues could be because of using a 32-bit Machine as it can process a maximum of 2GB of data at a time. If possible, scale up to a 64-bit machine to avoid problems in the future.

与此同时，可能会有黑客入侵。使用 df.to_csv（）选项将数据框转换为CSV。完成后，如果您在df.read_csv（）的文档/stable/generate/pandas.read_csv.html rel = nofollow noreferrer> read_csv的熊猫文档，您会注意到该参数

Meanwhile, there could be a hack to this. Convert the dataframe to CSV by using the df.to_csv() option. Once that's done, if you look into the documentation of the df.read_csv() in the pandas documentation of read_csv, you shall notice this parameter

na_values : scalar, str, list-like, or dict, default None

Additional strings to recognize as NA/NaN. If dict passed, specific   per-column NA values. By default the following values are interpreted as NaN: ‘’, ‘#N/A’, ‘#N/A N/A’, ‘#NA’, ‘-1.#IND’, ‘-1.#QNAN’, ‘-NaN’, ‘-nan’, ‘1.#IND’, ‘1.#QNAN’, ‘N/A’, ‘NA’, ‘NULL’, ‘NaN’, ‘nan’`.

因此，它将字符串 NaN识别为np.nan，您的问题将得到解决。

So,it shall recognize the string 'NaN' as np.nan and your problem shall be solved.

同时，如果直接通过CSV创建此数据帧，则可以使用此参数来避免内存问题。希望能帮助到你。
干杯！

Meanwhile, if you are directly creating this Dataframe through a CSV, you could use this parameter to avoid the memory problem. Hope it helps. Cheers!

这篇关于在大 pandas 数据框中替换NaN值时遇到Python内存错误的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在大 pandas 数据框中替换NaN值时遇到Python内存错误 [英] Python Memory Error encountered when replacing NaN values in large Pandas dataframe

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在大 pandas 数据框中替换NaN值时遇到Python内存错误 [英] Python Memory Error encountered when replacing NaN values in large Pandas dataframe

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭