Python:未堆叠的DataFrame太大,导致int32溢出 [英] Python: Unstacked DataFrame is too big, causing int32 overflow

查看:317
本文介绍了Python:未堆叠的DataFrame太大,导致int32溢出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个很大的数据集,当我尝试运行此代码时,出现内存错误.

  user_by_movie = user_items.groupby(['user_id','movie_id'])['rating'].max().unstack() 

这是错误:

  ValueError:未堆叠的DataFrame太大,导致int32溢出 

我已经在另一台机器上运行它,并且运行良好!我该如何解决该错误?

解决方案

事实证明,这对熊猫0.21来说不是问题.我使用的是Jupyter笔记本,其余代码需要最新版本的Pandas.所以我这样做了:

 !pip install pandas == 0.21将熊猫作为pd导入user_by_movie = user_items.groupby(['user_id','movie_id'])['rating'].max().unstack()!pip安装熊猫 

此代码在Jupyter笔记本上有效.首先,它将熊猫降级到0.21并运行代码.拥有所需的数据集后,它将熊猫更新为最新版本.请在此处中查看在GitHub上提出的问题.帖子也有助于增加Jupyter笔记本的内存.>

I have a big dataset and when I try to run this code I get a memory error.

user_by_movie = user_items.groupby(['user_id', 'movie_id'])['rating'].max().unstack()

here is the error:

ValueError: Unstacked DataFrame is too big, causing int32 overflow

I have run it on another machine and it worked fine! how can I fix this error?

解决方案

As it turns out this was not an issue on pandas 0.21. I am using a Jupyter notebook and I need the latest version of pandas for the rest of the code. So I did this:

!pip install pandas==0.21
import pandas as pd
user_by_movie = user_items.groupby(['user_id', 'movie_id'])['rating'].max().unstack()
!pip install pandas

This code works on the Jupyter notebook. First, it downgrades pandas to 0.21 and runs the code. After having the required dataset it updates pandas to the latest version. check the issue raised on GitHub here. This post was also helpful to increase memory of Jupyter notebook.

这篇关于Python:未堆叠的DataFrame太大,导致int32溢出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆