pandas.core.groupby.DataFrameGroupBy.idxmin()非常慢,如何使我的代码更快? [英] pandas.core.groupby.DataFrameGroupBy.idxmin() is very slow , how can i make my code faster?
问题描述
我正在尝试执行与SQL group by相同的操作并采用最小值:
i am trying to do same action as SQL group by and take min value :
select id,min(value) ,other_fields...
from table
group by ('id')
我尝试过:
dfg = df.groupby('id', sort=False)
idx = dfg['value'].idxmin()
df = df.loc[idx, list(df.columns.values)]
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.groupby.DataFrameGroupBy.idxmin.html 但是第2行,idxmin()在df中的〜4M列上花费了超过半小时,其中group by花费了不到1秒,我想念的是要花费这么长时间吗?如何使这个过程更快?在纯SQL中会更快吗?
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.groupby.DataFrameGroupBy.idxmin.html but line 2 the idxmin() is taking more than half hour on ~4M columns in df where the group by takes less than 1 second , what am i missing is it suppose to take that long ? how can make this process faster ? will it be faster in pure SQL ?
推荐答案
df1 = df.sort_values(by=['value']).drop_duplicates('id', keep='first')
这篇关于pandas.core.groupby.DataFrameGroupBy.idxmin()非常慢,如何使我的代码更快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!