pandas 比numpy慢得多? [英] pandas much slower than numpy?

查看:115
本文介绍了 pandas 比numpy慢得多?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

下面的代码表明,至少在函数clip()的特定情况下,熊猫可能比numpy慢得多.令人惊讶的是,在以numpy进行计算的同时,从熊猫到numpy再返回到熊猫的往返仍然比在熊猫中进行的速度要快得多.

熊猫功能是否应该以这种回旋方式实现?

In [49]: arr = np.random.randn(1000, 1000)

In [50]: df=pd.DataFrame(arr)

In [51]: %timeit np.clip(arr, 0, None)
100 loops, best of 3: 8.18 ms per loop

In [52]: %timeit df.clip_lower(0)
1 loops, best of 3: 344 ms per loop

In [53]: %timeit pd.DataFrame(np.clip(df.values, 0, None))
100 loops, best of 3: 8.4 ms per loop

解决方案

在master/0.13(很快发布)中,这要快得多(由于对alignment/dtype/nans的处理,它比本地numpy还要慢). /p>

每列应用0.12,因此这是一个相对昂贵的操作.

In [4]: arr = np.random.randn(1000, 1000)

In [5]: df=pd.DataFrame(arr)

In [6]: %timeit np.clip(arr, 0, None)
100 loops, best of 3: 6.62 ms per loop

In [7]: %timeit df.clip_lower(0)
100 loops, best of 3: 12.9 ms per loop

The code below suggests that pandas may be much slower than numpy, at least in the specifi case of the function clip(). What is surprising is that making a roundtrip from pandas to numpy and back to pandas, while performing the calculations in numpy, is still much faster than doing it in pandas.

Shouldn't the pandas function have been implemented in this roundabout way?

In [49]: arr = np.random.randn(1000, 1000)

In [50]: df=pd.DataFrame(arr)

In [51]: %timeit np.clip(arr, 0, None)
100 loops, best of 3: 8.18 ms per loop

In [52]: %timeit df.clip_lower(0)
1 loops, best of 3: 344 ms per loop

In [53]: %timeit pd.DataFrame(np.clip(df.values, 0, None))
100 loops, best of 3: 8.4 ms per loop

解决方案

In master/0.13 (release very shortly), this is much faster (still slightly slower that native numpy because of handling of alignment/dtype/nans).

In 0.12 it was applying per column, so this was a relatively expensive operation.

In [4]: arr = np.random.randn(1000, 1000)

In [5]: df=pd.DataFrame(arr)

In [6]: %timeit np.clip(arr, 0, None)
100 loops, best of 3: 6.62 ms per loop

In [7]: %timeit df.clip_lower(0)
100 loops, best of 3: 12.9 ms per loop

这篇关于 pandas 比numpy慢得多?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆