从0.22版开始,Pandas groupby +首先重采样确实很慢 [英] Pandas groupby + resample first is really slow - since version 0.22
本文介绍了从0.22版开始,Pandas groupby +首先重采样确实很慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一段代码对一个数据框进行分组,并为每个组运行resample('1D').first()
.自从我升级到0.22.0以来,它的运行速度要慢得多.
I have a piece of code that groups a dataframe and runs resample('1D').first()
for each group. Since I upgraded to 0.22.0, it runs much slower.
设置代码:
import pandas as pd
import numpy as np
import datetime as dt
import string
# set up some data
DATE_U = 50
STR_LEN = 10
STR_U = 50
N = 500
letters = list(string.ascii_lowercase)
def get_rand_string():
return ''.join(np.random.choice(letters, size=STR_LEN))
dates = np.random.randint(0, 100000000, size=DATE_U)
strings = [get_rand_string() for _ in range(STR_U)]
df = pd.DataFrame({
'date': np.random.choice(dates, N),
'string': np.random.choice(strings, N),
})
df['date'] = pd.to_datetime(df['date'], unit='s')
df = df.set_index('date')
print('Shape: {}'.format(df.shape))
print(df.head())
print('\nUnique strings: {}'.format(df['string'].nunique()))
print('Unique dates: {}'.format(df.index.nunique()))
(打印):
Shape: (500, 1)
string
date
1973-02-07 19:57:43 wafadvlvty
1973-02-27 03:43:02 shofwwdhtu
1972-04-25 18:11:20 xwbbpwtsfj
1970-09-03 18:00:59 zkxwnqgrqp
1971-03-18 10:09:44 ofsaxqprdx
Unique strings: 50
Unique dates: 50
首先测试分组依据并重新采样:
%%timeit -n 3 -r 3
def __apply(g):
g = g.resample('1D').first()
return g
print('Pandas version: {}'.format(pd.__version__))
dfg = df.groupby('string').apply(__apply)
对于熊猫0.21.0:
For Pandas 0.21.0:
Pandas version: 0.21.0
118 ms ± 1.63 ms per loop (mean ± std. dev. of 3 runs, 3 loops each)
对于熊猫0.22.0:
For Pandas 0.22.0:
Pandas version: 0.22.0
3 loops, best of 3: 2.3 s per loop
慢了大约20倍.我的问题是为什么呢?有没有办法使它在0.22.0中同样快?
Which is about 20 times slower. My questions is why is that? And is there a way to make this equally fast in 0.22.0?
推荐答案
使用.head(1)代替以下内容
Use .head(1) instead as the following
g = g.resample('1D').head(1)
这篇关于从0.22版开始,Pandas groupby +首先重采样确实很慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文