在Pandas中用均值转换组的更快方法 [英] Faster way to transform group with mean value in Pandas

查看:48
本文介绍了在Pandas中用均值转换组的更快方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个Pandas数据框,在这里我试图用组的平均值替换每个组中的值.在我的机器上,df["signal"].groupby(g).transform(np.mean)行需要大约10秒钟才能运行,并且NN_TRANSITIONS设置为以下数字.

I have a Pandas dataframe where I am trying to replace the values in each group by the mean of the group. On my machine, the line df["signal"].groupby(g).transform(np.mean) takes about 10 seconds to run with N and N_TRANSITIONS set to the numbers below.

有没有更快的方法来达到相同的结果?

Is there any faster way to achieve the same result?

import pandas as pd
import numpy as np
from time import time

np.random.seed(0)

N = 120000
N_TRANSITIONS = 1400

# generate groups
transition_points = np.random.permutation(np.arange(N))[:N_TRANSITIONS]
transition_points.sort()
transitions = np.zeros((N,), dtype=np.bool)
transitions[transition_points] = True
g = transitions.cumsum()

df = pd.DataFrame({ "signal" : np.random.rand(N)})

# here is my bottleneck for large N
tic = time()
result = df["signal"].groupby(g).transform(np.mean)
toc = time()
print toc - tic

推荐答案

灵感来自Jeff的答案.这是我机器上最快的方法:

Inspired by Jeff's answer. This is the fastest method on my machine:

pd.Series(np.repeat(grp.mean().values, grp.count().values))

这篇关于在Pandas中用均值转换组的更快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆