Python Pandas如何将groupby操作结果分配回父数据帧中的列？ [英] Python Pandas How to assign groupby operation results back to columns in parent dataframe?

查看：1218 发布时间：2017/3/25 23:43:13 python group-by dataframe pandas

本文介绍了Python Pandas如何将groupby操作结果分配回父数据帧中的列？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在IPython中有以下数据框架，每行都是单一库存：

 在[261]中：bdata 
 Out [261]：
< class'pandas.core.frame.DataFrame'> 
 Int64Index：21210条目，0到21209 
数据列：
 BloombergTicker 21206非空值
公司21210非空值
国家21210非空值
 MarketCap 21210非空值
 PriceReturn 21210非空值
 SEDOL 21210非空值
年份21210非空值
 dtypes：float64（2）， int64（1），object（4）

我想应用一个计算上限加权

这个工作原理如下：

 在[262]：bdata.groupby（yearmonth）。apply（lambda x：（x [PriceReturn] * x [MarketCap] / x [MarketCap]。 sum（））sum（））
 Out [262]：
 yearmonth 
 201204 -0.109444 
 201205 -0.290546 
  pre> 
 
 但是，我想将这些值广播回到t中的索引他原始的数据框，并将它们保存为日期匹配的常量列。
 在[263]中：dateGrps = bdata.groupby （yearmonth）
 
在[264]中：dateGrps [MarketReturn] = dateGrps.apply（lambda x：（x [PriceReturn] * x [MarketCap] / x [ MarketCap] sum（））。sum（））
 --------------------------------- ------------------------------------------ 
 TypeError追溯（最多最近的电话最后）
 / mnt / bos-devrnd04 / usr6 / home / espears / ws / Research / Projects / python-util / src / util /< ipython-input-264-4a68c8782426>在< module>（）
 ----> 1 dateGrps [MarketReturn] = dateGrps.apply（lambda x：（x [PriceReturn] * x [MarketCap] / x [MarketCap] sum（））。sum（））
 
 TypeError：'DataFrameGroupBy'对象不支持项目分配
  
我意识到这个天真的作业应该不行。但是，将组合操作的结果分配到父数据框的新列中的正确熊猫成语是什么？
 
 
 最后，我想要一个名为对于与groupby操作的输出具有匹配日期的所有索引，MarketReturn将重复一次。
 
 
 实现此目的的一个原因如下： 
  marketRetsByDate = dateGrps.apply（lambda x：（x [PriceReturn] * x [MarketCap] / x [ MarketCap] sum（））。sum（））
 
 bdata [MarketReturn] = np.repeat（np.NaN，len（bdata））
 
在MarketRetsByDate.index.values中的元素：
 bdata [MarketReturn] [bdata [yearmonth] == elem] = marketRetsByDate.ix [elem] 
  pre> 
 
 但这是缓慢，坏和unPythonic。
解决方案
 
 在[97]中：df = pandas.DataFrame（{'month'：np.random.randint（0,11,100），'A'：np.random.randn（100 ），'B'：np.random.randn（100）}）
 
在[98]中：df.join（df.groupby（'month'）['A']。sum（），on ='month'，rsuffix ='_ r'）
输出[98 ]：
 AB月A_r 
 0 -0.040710 0.182269 0 -0.331816 
 1 -0.004867 0.642243 1 2.448232 
 2 -0.162191 0.442338 4 2.045909 
 3 -0.979875 1.367018 5  - 2.736399 
 4 -1.126198 0.338946 5 -2.736399 
 5 -0.992209 -1.343258 1 2.448232 
 6 -1.450310 0.021290 0 -0.331816 
 7 -0.675345 -1.359915 9 2.722156 
  
 
I have the following data frame in IPython, where each row is a single stock:
In [261]: bdata
Out[261]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 21210 entries, 0 to 21209
Data columns:
BloombergTicker      21206  non-null values
Company              21210  non-null values
Country              21210  non-null values
MarketCap            21210  non-null values
PriceReturn          21210  non-null values
SEDOL                21210  non-null values
yearmonth            21210  non-null values
dtypes: float64(2), int64(1), object(4)
I want to apply a groupby operation that computes cap-weighted average return across everything, per each date in the "yearmonth" column.

This works as expected:
In [262]: bdata.groupby("yearmonth").apply(lambda x: (x["PriceReturn"]*x["MarketCap"]/x["MarketCap"].sum()).sum())
Out[262]:
yearmonth
201204      -0.109444
201205      -0.290546
But then I want to sort of "broadcast" these values back to the indices in the original data frame, and save them as constant columns where the dates match.
In [263]: dateGrps = bdata.groupby("yearmonth")

In [264]: dateGrps["MarketReturn"] = dateGrps.apply(lambda x: (x["PriceReturn"]*x["MarketCap"]/x["MarketCap"].sum()).sum())
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/mnt/bos-devrnd04/usr6/home/espears/ws/Research/Projects/python-util/src/util/<ipython-input-264-4a68c8782426> in <module>()
----> 1 dateGrps["MarketReturn"] = dateGrps.apply(lambda x: (x["PriceReturn"]*x["MarketCap"]/x["MarketCap"].sum()).sum())

TypeError: 'DataFrameGroupBy' object does not support item assignment
I realize this naive assignment should not work. But what is the "right" Pandas idiom for assigning the result of a groupby operation into a new column on the parent dataframe?

In the end, I want a column called "MarketReturn" than will be a repeated constant value for all indices that have matching date with the output of the groupby operation.

One hack to achieve this would be the following:
marketRetsByDate  = dateGrps.apply(lambda x: (x["PriceReturn"]*x["MarketCap"]/x["MarketCap"].sum()).sum())

bdata["MarketReturn"] = np.repeat(np.NaN, len(bdata))

for elem in marketRetsByDate.index.values:
    bdata["MarketReturn"][bdata["yearmonth"]==elem] = marketRetsByDate.ix[elem]
But this is slow, bad, and unPythonic.
 解决方案 
In [97]: df = pandas.DataFrame({'month': np.random.randint(0,11, 100), 'A': np.random.randn(100), 'B': np.random.randn(100)})

In [98]: df.join(df.groupby('month')['A'].sum(), on='month', rsuffix='_r')
Out[98]:
           A         B  month       A_r
0  -0.040710  0.182269      0 -0.331816
1  -0.004867  0.642243      1  2.448232
2  -0.162191  0.442338      4  2.045909
3  -0.979875  1.367018      5 -2.736399
4  -1.126198  0.338946      5 -2.736399
5  -0.992209 -1.343258      1  2.448232
6  -1.450310  0.021290      0 -0.331816
7  -0.675345 -1.359915      9  2.722156


                        
这篇关于Python Pandas如何将groupby操作结果分配回父数据帧中的列？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Python Pandas如何将groupby操作结果分配回父数据帧中的列？ [英] Python Pandas How to assign groupby operation results back to columns in parent dataframe?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python Pandas如何将groupby操作结果分配回父数据帧中的列？ [英] Python Pandas How to assign groupby operation results back to columns in parent dataframe?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭