Python:未堆叠的DataFrame太大,导致int32溢出 [英] Python: Unstacked DataFrame is too big, causing int32 overflow
问题描述
我有一个很大的数据集,当我尝试运行此代码时,出现内存错误.
user_by_movie = user_items.groupby(['user_id','movie_id'])['rating'].max().unstack()
这是错误:
ValueError:未堆叠的DataFrame太大,导致int32溢出
我已经在另一台机器上运行它,并且运行良好!我该如何解决该错误?
事实证明,这对熊猫0.21来说不是问题.我使用的是Jupyter笔记本,其余代码需要最新版本的Pandas.所以我这样做了:
!pip install pandas == 0.21将熊猫作为pd导入user_by_movie = user_items.groupby(['user_id','movie_id'])['rating'].max().unstack()!pip安装熊猫
此代码在Jupyter笔记本上有效.首先,它将熊猫降级到0.21并运行代码.拥有所需的数据集后,它将熊猫更新为最新版本.请在此处中查看在GitHub上提出的问题.此帖子也有助于增加Jupyter笔记本的内存.>
I have a big dataset and when I try to run this code I get a memory error.
user_by_movie = user_items.groupby(['user_id', 'movie_id'])['rating'].max().unstack()
here is the error:
ValueError: Unstacked DataFrame is too big, causing int32 overflow
I have run it on another machine and it worked fine! how can I fix this error?
As it turns out this was not an issue on pandas 0.21. I am using a Jupyter notebook and I need the latest version of pandas for the rest of the code. So I did this:
!pip install pandas==0.21
import pandas as pd
user_by_movie = user_items.groupby(['user_id', 'movie_id'])['rating'].max().unstack()
!pip install pandas
This code works on the Jupyter notebook. First, it downgrades pandas to 0.21 and runs the code. After having the required dataset it updates pandas to the latest version. check the issue raised on GitHub here. This post was also helpful to increase memory of Jupyter notebook.
这篇关于Python:未堆叠的DataFrame太大,导致int32溢出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!