pandas 数据框计算的不稳定性 [英] Instability of pandas dataframe calculations

查看:78
本文介绍了 pandas 数据框计算的不稳定性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道以前是否有人见过熊猫这个问题.基本上,我试图逐个元素地对DataFrames进行加,乘和除(所有帧具有相同的索引和列),但是Pandas对于连续执行的相同计算会吐出不同的结果.

I'm wondering whether anyone has seen this problem with Pandas before. Basically, I'm trying to add, multiply, and divide DataFrames element-by-element (all the frames have identical indexes and columns), but Pandas is spitting out different results for the same calculation performed successively.

下面显示一些示例输出的图像.由于出于显示目的,我在下面的代码中使用了.values,但是使用.add().mul().div()时可能会发生不稳定.例如,如果我反复输入N11.add(N00),我通常会得到正确的答案,但是偶尔(每4或5次),我会得到一个填充有0的DataFrame.

An image of some example output is shown below. I've used .values in the code below because for display purposes, but the instability can happen when using .add(), .mul(), or .div(). For example, if I repeatedly enter N11.add(N00), I usually get the correct answer, but occasionally (every 4th or 5th time), I get a DataFrame filled with 0s.

如果有关系,我将在Windows 10上使用Anaconda发行的Pandas 0.17.0(在Spyder 2.3.7上使用Python 2.7.10).我正在使用的框架很大(6856 x 12511).还有其他人遇到过这个问题吗?这是一个已知问题还是我做错了什么?

If it matters, I'm on Windows 10 using an Anaconda distribution of Pandas 0.17.0 (with Python 2.7.10 on Spyder 2.3.7). The frames that I am working with are large (6856 by 12511). Has anyone else encountered this problem? Is this a known issue or am I doing something wrong?

推荐答案

我今天遇到了类似的问题,它是由 numexpr 2.4.4中的错误.正如这张熊猫票和其他与此链接.

I encountered a similar issue today and it was caused by a bug in numexpr 2.4.4. It seems to be biting other pandas users in various ways, as reported in this pandas ticket and others linked to it.

将numexpr升级到2.4.6可以为我们解决问题,但是看起来任何非2.4.4的版本都可以!

Upgrading numexpr to 2.4.6 solved the problem for us, but it looks like any version that's not 2.4.4 should be fine!

这篇关于 pandas 数据框计算的不稳定性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆