pandas 性能问题-需要帮助以进行优化 [英] pandas performance issue - need help to optimize

查看：81 发布时间：2020/5/24 2:19:24 python pandas

本文介绍了 pandas 性能问题-需要帮助以进行优化的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我写了一些python代码，大量使用了pandas库.该代码似乎有点慢，所以我通过cProfile运行它来查看瓶颈在哪里. 根据cProfile结果的瓶颈之一是对pandas.lib_scalar_compare的调用:

I wrote some python code that makes heavy use of the pandas library. The code seems to be a bit slow, so I ran it through cProfile to see where the bottlenecks are. One of the the bottlenecks according to the cProfile results is the call to pandas.lib_scalar_compare:

1604  262.301    0.164  262.301    0.164 {pandas.lib.scalar_compare}

我的问题是-在什么情况下会被称为?当我选择DataFrame的一部分时，我假设它是. 这是我的代码:

My question is this - under what circumstances does this get called ? I assume its when I do selecting of part of a DataFrame. Here is what my code looks like:

if (var=='9999'):
    dataTable=resultTable.ix[(resultTable['col1'] == var1)  
                                             & (resultTable['col2']==var2)].copy() 
else:
    dataTable=resultTable.ix[(resultTable['col1'] == var1)  
                                           & (resultTable['col2']==var2)
                                           & (resultTable['col3']==int(val3))].copy()

我有以下问题:

是最终调用导致瓶颈的代码的代码段吗?
如果是这样，有什么可以优化的吗? 我当前使用的熊猫版本是 pandas-0.8 .

Is that the code snippet that eventually calls the code that causes the bottleneck?
If so, is there anyway to optimize this? The version of pandas I am currently using is pandas-0.8.

在此方面的任何帮助将不胜感激.

Any help on this would be greatly appreciated.

推荐答案

我的代码在pandas.lib.scalar_compare中花费了大量时间，并且通过转换基于字符串的数据类型，我能够将速度提高10倍类别"列.

My code was spending a ton of time in pandas.lib.scalar_compare, and I was able to increase the speed by 10x by converting the datatype of string-based columns to 'category'.

例如:

   $ df['ResourceName'] = df['ResourceName'].astype('category')

有关更多信息，请参见 https://www.continuum.io/content/pandas-categoricals

For more information, see https://www.continuum.io/content/pandas-categoricals

这篇关于 pandas 性能问题-需要帮助以进行优化的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

pandas 性能问题-需要帮助以进行优化 [英] pandas performance issue - need help to optimize

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

pandas 性能问题-需要帮助以进行优化 [英] pandas performance issue - need help to optimize

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭