pandas groupby可以将DataFrame转换为Series吗? [英] Can pandas groupby transform a DataFrame into a Series?
问题描述
我想使用pandas和statsmodels在数据框的子集上拟合线性模型并返回预测值.但是,我在找出正确的熊猫习惯用法时遇到了麻烦.这是我正在尝试做的事情:
I would like to use pandas and statsmodels to fit a linear model on subsets of a dataframe and return the predicted values. However, I am having trouble figuring out the right pandas idiom to use. Here is what I am trying to do:
import pandas as pd
import statsmodels.formula.api as sm
import seaborn as sns
tips = sns.load_dataset("tips")
def fit_predict(df):
m = sm.ols("tip ~ total_bill", df).fit()
return pd.Series(m.predict(df), index=df.index)
tips["predicted_tip"] = tips.groupby("day").transform(fit_predict)
这会引发以下错误:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-139-b3d2575e2def> in <module>()
----> 1 tips["predicted_tip"] = tips.groupby("day").transform(fit_predict)
/Users/mwaskom/anaconda/lib/python2.7/site-packages/pandas/core/groupby.pyc in transform(self, func, *args, **kwargs)
3033 return self._transform_general(func, *args, **kwargs)
3034 except:
-> 3035 return self._transform_general(func, *args, **kwargs)
3036
3037 # a reduction transform
/Users/mwaskom/anaconda/lib/python2.7/site-packages/pandas/core/groupby.pyc in _transform_general(self, func, *args, **kwargs)
2988 group.T.values[:] = res
2989 else:
-> 2990 group.values[:] = res
2991
2992 applied.append(group)
ValueError: could not broadcast input array from shape (62) into shape (62,6)
该错误是有意义的,因为我认为.transform
想要将DataFrame映射到DataFrame.但是,是否有一种方法可以对DataFrame进行分组操作,将每个块传递给一个将其缩减为Series(具有相同索引)的函数,然后将生成的Series组合成可以插入到原始Dataframe中的东西?
The error makes sense in that I think .transform
wants to map a DataFrame to a DataFrame. But is there a way to do a groupby operation on a DataFrame, pass each chunk into a function that reduces it to a Series (with the same index), and then combine the resulting Series into something that can be inserted into the original dataframe?
推荐答案
这里的顶部是相同的,我只是使用玩具数据集b/c,位于防火墙后面.
The top part here is the same, I'm just using a toy dataset b/c I'm behind a firewall.
tips = pd.DataFrame({ 'day':list('MMMFFF'), 'tip':range(6),
'total_bill':[10,40,20,80,50,40] })
def fit_predict(df):
m = sm.ols("tip ~ total_bill", df).fit()
return pd.Series(m.predict(df), index=df.index)
如果将转换"更改为应用",则会得到:
If you change 'transform' to 'apply', you'll get:
tips.groupby("day").apply(fit_predict)
day
F 3 2.923077
4 4.307692
5 4.769231
M 0 0.714286
1 1.357143
2 0.928571
这不是您想要的,但是如果您将level降为0,则可以根据需要进行操作:
That's not quite what you want, but if you drop level=0, you can proceed as desired:
tips['predicted'] = tips.groupby("day").apply(fit_predict).reset_index(level=0,drop=True)
day tip total_bill predicted
0 M 0 10 0.714286
1 M 1 40 1.357143
2 M 2 20 0.928571
3 F 3 80 2.923077
4 F 4 50 4.307692
5 F 5 40 4.769231
这篇关于 pandas groupby可以将DataFrame转换为Series吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!