Python Pandas 如何将 groupby 操作结果分配回父数据框中的列? [英] Python Pandas How to assign groupby operation results back to columns in parent dataframe?

查看：48 发布时间：2021/12/27 0:00:18 python group-by dataframe pandas

本文介绍了Python Pandas 如何将 groupby 操作结果分配回父数据框中的列?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在 IPython 中有以下数据框，其中每一行都是一只股票:

I have the following data frame in IPython, where each row is a single stock:

In [261]: bdata
Out[261]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 21210 entries, 0 to 21209
Data columns:
BloombergTicker      21206  non-null values
Company              21210  non-null values
Country              21210  non-null values
MarketCap            21210  non-null values
PriceReturn          21210  non-null values
SEDOL                21210  non-null values
yearmonth            21210  non-null values
dtypes: float64(2), int64(1), object(4)

我想应用 groupby 操作，计算yearmonth"列中每个日期的所有内容的上限加权平均回报.

I want to apply a groupby operation that computes cap-weighted average return across everything, per each date in the "yearmonth" column.

这按预期工作:

In [262]: bdata.groupby("yearmonth").apply(lambda x: (x["PriceReturn"]*x["MarketCap"]/x["MarketCap"].sum()).sum())
Out[262]:
yearmonth
201204      -0.109444
201205      -0.290546

但是我想将这些值广播"回原始数据框中的索引，并将它们保存为日期匹配的常量列.

But then I want to sort of "broadcast" these values back to the indices in the original data frame, and save them as constant columns where the dates match.

In [263]: dateGrps = bdata.groupby("yearmonth")

In [264]: dateGrps["MarketReturn"] = dateGrps.apply(lambda x: (x["PriceReturn"]*x["MarketCap"]/x["MarketCap"].sum()).sum())
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/mnt/bos-devrnd04/usr6/home/espears/ws/Research/Projects/python-util/src/util/<ipython-input-264-4a68c8782426> in <module>()
----> 1 dateGrps["MarketReturn"] = dateGrps.apply(lambda x: (x["PriceReturn"]*x["MarketCap"]/x["MarketCap"].sum()).sum())

TypeError: 'DataFrameGroupBy' object does not support item assignment

我意识到这个幼稚的任务不应该奏效.但是，将 groupby 操作的结果分配到父数据帧上的新列的正确"Pandas 习惯用法是什么?

I realize this naive assignment should not work. But what is the "right" Pandas idiom for assigning the result of a groupby operation into a new column on the parent dataframe?

最后，我想要一个名为MarketReturn"的列，它是所有索引的重复常量值，这些索引的日期与 groupby 操作的输出相匹配.

In the end, I want a column called "MarketReturn" than will be a repeated constant value for all indices that have matching date with the output of the groupby operation.

实现这一目标的一个技巧如下:

One hack to achieve this would be the following:

marketRetsByDate  = dateGrps.apply(lambda x: (x["PriceReturn"]*x["MarketCap"]/x["MarketCap"].sum()).sum())

bdata["MarketReturn"] = np.repeat(np.NaN, len(bdata))

for elem in marketRetsByDate.index.values:
    bdata["MarketReturn"][bdata["yearmonth"]==elem] = marketRetsByDate.ix[elem]

但这很慢，很糟糕，而且不符合 Python 风格.

But this is slow, bad, and unPythonic.

Python Pandas 如何将 groupby 操作结果分配回父数据框中的列? [英] Python Pandas How to assign groupby operation results back to columns in parent dataframe?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python Pandas 如何将 groupby 操作结果分配回父数据框中的列? [英] Python Pandas How to assign groupby operation results back to columns in parent dataframe?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭