Python Pandas如何将groupby操作结果分配回父数据帧中的列? [英] Python Pandas How to assign groupby operation results back to columns in parent dataframe?
问题描述
在[261]中:bdata
Out [261]:
< class'pandas.core.frame.DataFrame'>
Int64Index:21210条目,0到21209
数据列:
BloombergTicker 21206非空值
公司21210非空值
国家21210非空值
MarketCap 21210非空值
PriceReturn 21210非空值
SEDOL 21210非空值
年份21210非空值
dtypes:float64(2), int64(1),object(4)
我想应用一个计算上限加权
这个工作原理如下:
在[262]:bdata.groupby(yearmonth)。apply(lambda x:(x [PriceReturn] * x [MarketCap] / x [MarketCap]。 sum())sum())
pre>
Out [262]:
yearmonth
201204 -0.109444
201205 -0.290546
但是,我想将这些值广播回到t中的索引他原始的数据框,并将它们保存为日期匹配的常量列。
在[263]中:dateGrps = bdata.groupby (yearmonth)
在[264]中:dateGrps [MarketReturn] = dateGrps.apply(lambda x:(x [PriceReturn] * x [MarketCap] / x [ MarketCap] sum())。sum())
--------------------------------- ------------------------------------------
TypeError追溯(最多最近的电话最后)
/ mnt / bos-devrnd04 / usr6 / home / espears / ws / Research / Projects / python-util / src / util /< ipython-input-264-4a68c8782426>在< module>()
----> 1 dateGrps [MarketReturn] = dateGrps.apply(lambda x:(x [PriceReturn] * x [MarketCap] / x [MarketCap] sum())。sum())
TypeError:'DataFrameGroupBy'对象不支持项目分配
我意识到这个天真的作业应该不行。但是,将组合操作的结果分配到父数据框的新列中的正确熊猫成语是什么?
最后,我想要一个名为对于与groupby操作的输出具有匹配日期的所有索引,MarketReturn将重复一次。
实现此目的的一个原因如下:
marketRetsByDate = dateGrps.apply(lambda x:(x [PriceReturn] * x [MarketCap] / x [ MarketCap] sum())。sum())
pre>
bdata [MarketReturn] = np.repeat(np.NaN,len(bdata))
在MarketRetsByDate.index.values中的元素:
bdata [MarketReturn] [bdata [yearmonth] == elem] = marketRetsByDate.ix [elem]
但这是缓慢,坏和unPythonic。
解决方案
在[97]中:df = pandas.DataFrame({'month':np.random.randint(0,11,100),'A':np.random.randn(100 ),'B':np.random.randn(100)})
在[98]中:df.join(df.groupby('month')['A']。sum(),on ='month',rsuffix ='_ r')
输出[98 ]:
AB月A_r
0 -0.040710 0.182269 0 -0.331816
1 -0.004867 0.642243 1 2.448232
2 -0.162191 0.442338 4 2.045909
3 -0.979875 1.367018 5 - 2.736399
4 -1.126198 0.338946 5 -2.736399
5 -0.992209 -1.343258 1 2.448232
6 -1.450310 0.021290 0 -0.331816
7 -0.675345 -1.359915 9 2.722156
I have the following data frame in IPython, where each row is a single stock:
In [261]: bdata Out[261]: <class 'pandas.core.frame.DataFrame'> Int64Index: 21210 entries, 0 to 21209 Data columns: BloombergTicker 21206 non-null values Company 21210 non-null values Country 21210 non-null values MarketCap 21210 non-null values PriceReturn 21210 non-null values SEDOL 21210 non-null values yearmonth 21210 non-null values dtypes: float64(2), int64(1), object(4)
I want to apply a groupby operation that computes cap-weighted average return across everything, per each date in the "yearmonth" column.
This works as expected:
In [262]: bdata.groupby("yearmonth").apply(lambda x: (x["PriceReturn"]*x["MarketCap"]/x["MarketCap"].sum()).sum()) Out[262]: yearmonth 201204 -0.109444 201205 -0.290546
But then I want to sort of "broadcast" these values back to the indices in the original data frame, and save them as constant columns where the dates match.
In [263]: dateGrps = bdata.groupby("yearmonth") In [264]: dateGrps["MarketReturn"] = dateGrps.apply(lambda x: (x["PriceReturn"]*x["MarketCap"]/x["MarketCap"].sum()).sum()) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) /mnt/bos-devrnd04/usr6/home/espears/ws/Research/Projects/python-util/src/util/<ipython-input-264-4a68c8782426> in <module>() ----> 1 dateGrps["MarketReturn"] = dateGrps.apply(lambda x: (x["PriceReturn"]*x["MarketCap"]/x["MarketCap"].sum()).sum()) TypeError: 'DataFrameGroupBy' object does not support item assignment
I realize this naive assignment should not work. But what is the "right" Pandas idiom for assigning the result of a groupby operation into a new column on the parent dataframe?
In the end, I want a column called "MarketReturn" than will be a repeated constant value for all indices that have matching date with the output of the groupby operation.
One hack to achieve this would be the following:
marketRetsByDate = dateGrps.apply(lambda x: (x["PriceReturn"]*x["MarketCap"]/x["MarketCap"].sum()).sum()) bdata["MarketReturn"] = np.repeat(np.NaN, len(bdata)) for elem in marketRetsByDate.index.values: bdata["MarketReturn"][bdata["yearmonth"]==elem] = marketRetsByDate.ix[elem]
But this is slow, bad, and unPythonic.
解决方案In [97]: df = pandas.DataFrame({'month': np.random.randint(0,11, 100), 'A': np.random.randn(100), 'B': np.random.randn(100)}) In [98]: df.join(df.groupby('month')['A'].sum(), on='month', rsuffix='_r') Out[98]: A B month A_r 0 -0.040710 0.182269 0 -0.331816 1 -0.004867 0.642243 1 2.448232 2 -0.162191 0.442338 4 2.045909 3 -0.979875 1.367018 5 -2.736399 4 -1.126198 0.338946 5 -2.736399 5 -0.992209 -1.343258 1 2.448232 6 -1.450310 0.021290 0 -0.331816 7 -0.675345 -1.359915 9 2.722156
这篇关于Python Pandas如何将groupby操作结果分配回父数据帧中的列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!