值错误:合并时不允许使用负尺寸 [英] Value Error: negative dimensions are not allowed when merging

查看:112
本文介绍了值错误:合并时不允许使用负尺寸的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在将2个数据框合并在一起.它们最初是.csv文件,每个文件只有7 MB(2列和290,000行).我正在像这样合并:

I am merging 2 dataframes together. They are originally .csv files which are only 7 megabytes each (2 columns and 290,000 rows). I am merging like this:

merge=pd.merge(df1,df2, on=['POINTID'], how='outer')

在32位Anaconda中,我得到了:

and in 32-bit Anaconda I get:

ValueError: negative dimensions are not allowed

但是在64位Anaconda上出现内存错误.

but on 64-bit Anaconda I get a memory error.

我有12 GB的RAM,并且只有30%的RAM被使用,因此它不应该是内存问题.我尝试在另一台计算机上遇到相同的问题.

I have 12 gigabytes of RAM and only 30% of it is being used so it should not be a memory issue. I tried on another computer and get the same issue.

推荐答案

在32位计算机上,默认的NumPy整数dtype为int32. 在64位计算机上,默认的NumPy整数dtype为int64.

On a 32-bit machine, the default NumPy integer dtype is int32. On a 64-bit machine, the default NumPy integer dtype is int64.

可由int32int64表示的最大整数是:

The largest integers representable by an int32 and int64 are:

In [88]: np.iinfo('int32').max
Out[88]: 2147483647

In [87]: np.iinfo('int64').max
Out[87]: 9223372036854775807

因此,由pd.merge创建的整数索引将在32位计算机上最多支持2147483647 = 2**31-1行,在64位计算机上最多支持9223372036854775807 = 2**63-1行.

So the integer index created by pd.merge will support a maximum of 2147483647 = 2**31-1 rows on a 32-bit machine, and 9223372036854775807 = 2**63-1 rows on a 64-bit machine.

理论上,通过outer连接合并的两个290000行DataFrame可能具有多达290000**2 = 84100000000行.自

In theory, two 290000 row DataFrames merged with an outer join may have as many as 290000**2 = 84100000000 rows. Since

In [89]: 290000**2 > np.iinfo('int32').max
Out[89]: True

32位计算机可能无法生成足以索引合并结果的整数索引.

the 32-bit machine may not be able to generate an integer index big enough to index the merged result.

尽管理论上64位计算机可以生成足以容纳结果的整数索引,但您可能没有足够的内存来构建840亿行的DataFrame.

And although the 64-bit machine can in theory generate an integer index big enough to accommodate the result, you may not have enough memory to build a 84 billion-row DataFrame.

现在,当然,合并的DataFrame可能少于840亿行(确切的行数取决于df1['POINTID']df2['POINTID']中出现多少重复值),但上述信封计算表明:您看到的行为与重复很多一致.

Now, of course, the merged DataFrame may have fewer than 84 billion rows (the exact number depends on how many duplicate values appear in df1['POINTID'] and df2['POINTID']) but the above back-of-the envelope calculation shows that the behavior you are seeing is consistent with having a lot of duplicates.

PS.如果存在算术溢出,则在NumPy数组中添加或乘以正整数时,您可能会得到负值:

PS. You can get negative values when adding or multiplying positive integers in NumPy arrays if there is arithmetic overflow:

In [92]: np.int32(290000)*np.int32(290000)
Out[92]: -1799345920

我的猜测是这是导致异常的原因:

My guess is that this is the reason for the exception:

ValueError: negative dimensions are not allowed

这篇关于值错误:合并时不允许使用负尺寸的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆