pandas.core.groupby.DataFrameGroupBy.idxmin()非常慢,如何使我的代码更快? [英] pandas.core.groupby.DataFrameGroupBy.idxmin() is very slow , how can i make my code faster?

查看:237
本文介绍了pandas.core.groupby.DataFrameGroupBy.idxmin()非常慢,如何使我的代码更快?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试执行与SQL group by相同的操作并采用最小值:

i am trying to do same action as SQL group by and take min value :

select id,min(value) ,other_fields...
from table
group by ('id')

我尝试过:

dfg = df.groupby('id', sort=False)
idx = dfg['value'].idxmin()
df = df.loc[idx, list(df.columns.values)]

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.groupby.DataFrameGroupBy.idxmin.html 但是第2行,idxmin()在df中的〜4M列上花费了超过半小时,其中group by花费了不到1秒,我想念的是要花费这么长时间吗?如何使这个过程更快?在纯SQL中会更快吗?

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.groupby.DataFrameGroupBy.idxmin.html but line 2 the idxmin() is taking more than half hour on ~4M columns in df where the group by takes less than 1 second , what am i missing is it suppose to take that long ? how can make this process faster ? will it be faster in pure SQL ?

推荐答案

查看全文

登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆