比pandas groupby更快的数据分组方式 [英] Faster way to group data than pandas groupby

查看：124 发布时间：2021/5/10 19:05:20 python pandas performance pandas-groupby genetic-algorithm

本文介绍了比pandas groupby更快的数据分组方式的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在实施遗传算法.对于此算法，必须进行多次迭代(介于100到500之间)，其中在每次迭代中都要评估所有100个个体的适应性".在此程度上，我已经编写了一个评估函数.但是，即使是一次迭代，评估100个人的适应度也已经花费了13秒.为了实现高效的算法，我必须大幅度加快速度.

I am implementing a Genetic Algorithm. For this algorithm a number of iterations (between 100 to 500) have to be done where in each iteration all 100 individuals are evaluated for their 'fitness'. To this extent, I have written an evaluate function. However, even for one iteration evaluating the fitness of the 100 individuals already takes 13 seconds. I have to speed this up massively in order to implement an efficient algorithm.

valuate函数接受两个参数，然后执行一些计算.我将共享该函数的一部分，因为此后将重复类似的计算形式.具体来说，我现在对名为df_demand的数据帧执行分组，然后取列表理解的总和，该列表推导使用来自groupby函数的结果数据帧和另一个名为df_distance的数据帧.df_demand的片段如下所示，但实际尺寸较大(索引仅为0,1,2，...):

The evaluate function takes two arguments, and then performs some calculations. I will share part of the function since a similar form of calculation is repeated after that. Specifically, I now perform a groupby to a dataframe called df_demand, and then take the sum of a list comprehension that uses the resulting dataframe from the groupby function and another dataframe called df_distance. A snippet of df_demand looks as follows but has larger dimensions in reality (index is just 0,1,2,...):

date         customer    deliveries   warehouse   
2020-10-21          A            30           1
2020-10-21          A            47           1
2020-10-21          A            59           2
2020-10-21          B           130           3
2020-10-21          B           102           3 
2020-10-21          B            95           2
2020-10-22          A            55           1             
2020-10-22          A            46           4 
2020-10-22          A            57           4
2020-10-22          B            89           3 
2020-10-22          B           104           3
2020-10-22          B           106           4

和df_distance的摘要是(其中的列是仓库):

and a snippet of df_distance is (where the columns are the warehouses):

index   1     2      3       4
A       30.2    54.3   76.3   30.9
B       96.2    34.2   87.7   102.4
C       57.0    99.5   76.4   34.5

接下来，我要对df_demand进行分组，以使(日期，客户，仓库)的每个组合出现一次，并汇总该组合的所有交货.最后，我要计算总成本.目前，我已经执行了以下操作，但这太慢了:

Next, I want to groupby df_demand such that each combination of (date, customer, warehouse) appears once and all deliveries for this combination are summed. Finally, I want to calculate total costs. Currently, I have done the following but this is too slow:

def evaluate(df_demand, df_distance):
    costs = df_demand.groupby(["date", "customer", "warehouse"]).sum().reset_index()
    cost = sum([math.ceil(costs.iat[i, 3] / 20) * df_distance.loc[costs.iat[i, 1], costs.iat[i, 2]] for i in range(len(costs))])

    etc... 
    return cost

由于我必须进行多次迭代，并且考虑到数据的维数要大得多这一事实，所以我的问题是:执行此操作的最快方法是什么?

Since I have to do many iterations and considering the fact that dimensions of my data are considerably larger, my question is: what is the fastest way to do this operation?

比pandas groupby更快的数据分组方式 [英] Faster way to group data than pandas groupby

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

比pandas groupby更快的数据分组方式 [英] Faster way to group data than pandas groupby

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭