pandas 性能问题-需要帮助以进行优化 [英] pandas performance issue - need help to optimize

查看:81
本文介绍了 pandas 性能问题-需要帮助以进行优化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我写了一些python代码,大量使用了pandas库.该代码似乎有点慢,所以我通过cProfile运行它来查看瓶颈在哪里. 根据cPro​​file结果的瓶颈之一是对pandas.lib_scalar_compare的调用:

I wrote some python code that makes heavy use of the pandas library. The code seems to be a bit slow, so I ran it through cProfile to see where the bottlenecks are. One of the the bottlenecks according to the cProfile results is the call to pandas.lib_scalar_compare:

1604  262.301    0.164  262.301    0.164 {pandas.lib.scalar_compare}

我的问题是-在什么情况下会被称为?当我选择DataFrame的一部分时,我假设它是. 这是我的代码:

My question is this - under what circumstances does this get called ? I assume its when I do selecting of part of a DataFrame. Here is what my code looks like:

if (var=='9999'):
    dataTable=resultTable.ix[(resultTable['col1'] == var1)  
                                             & (resultTable['col2']==var2)].copy() 
else:
    dataTable=resultTable.ix[(resultTable['col1'] == var1)  
                                           & (resultTable['col2']==var2)
                                           & (resultTable['col3']==int(val3))].copy() 

我有以下问题:

  1. 是最终调用导致瓶颈的代码的代码段吗?
  2. 如果是这样,有什么可以优化的吗? 我当前使用的熊猫版本是 pandas-0.8 .
  1. Is that the code snippet that eventually calls the code that causes the bottleneck?
  2. If so, is there anyway to optimize this? The version of pandas I am currently using is pandas-0.8.

在此方面的任何帮助将不胜感激.

Any help on this would be greatly appreciated.

推荐答案

我的代码在pandas.lib.scalar_compare中花费了大量时间,并且通过转换基于字符串的数据类型,我能够将速度提高10倍类别"列.

My code was spending a ton of time in pandas.lib.scalar_compare, and I was able to increase the speed by 10x by converting the datatype of string-based columns to 'category'.

例如:

   $ df['ResourceName'] = df['ResourceName'].astype('category')

有关更多信息,请参见 https://www.continuum.io/content/pandas-categoricals

For more information, see https://www.continuum.io/content/pandas-categoricals

这篇关于 pandas 性能问题-需要帮助以进行优化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆