如何使用需要唯一的列在 pandas 中执行移动均线? [英] How do I perform a moving average in panda with a column that needs to be unique?
问题描述
我有一个数据框,如下所示:
I have a data frame like the one below:
index Player Team Matchup Game_Date WL Min PTS FGM FGA FG% 3PM 3PA 3P% FTM FTA FT% OREB DREB REB AST STL BLK TOV PF Plus_Minus Triple_Double Double_Double FPT 2PA 2PM 2P% Home_Away
276100 1 John Long TOR TOR @ BOS 04/20/1997 W 6.0 0.0 0.0 3.0 0.0 0.0 1.0 0.0 0.0 0.0 0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 2.0 0.0 0.0 1.50 2.0 0.0 0.000000 Away
276101 2 Walt Williams TOR TOR @ BOS 04/20/1997 W 29.0 7.0 3.0 9.0 33.3 1.0 2.0 50.0 0.0 0.0 0 3.0 3.0 3.0 2.0 2.0 1.0 1.0 3.0 20.0 0.0 0.0 19.75 7.0 2.0 28.571429 Away
276102 3 Todd Day BOS BOS vs. TOR 04/20/1997 L 36.0 22.0 8.0 17.0 47.1 4.0 8.0 50.0 2.0 2.0 100 8.0 8.0 6.0 4.0 0.0 0.0 3.0 8.0 -21.0 0.0 0.0 36.00 9.0 4.0 44.444444 Home
276103 4 Doug Christie TOR TOR @ BOS 04/20/1997 W 39.0 27.0 8.0 19.0 42.1 3.0 9.0 33.3 8.0 8.0 100 8.0 8.0 1.0 5.0 3.0 1.0 0.0 8.0 30.0 0.0 0.0 45.25 10.0 5.0 50.000000 Away
276104 5 Brett Szabo BOS BOS vs. TOR 04/20/1997 L 25.0 5.0 1.0 4.0 25.0 0.0 0.0 0 3.0 4.0 75.0 1.0 1.0 3.0 1.0 0.0 0.0 0.0 1.0 -11.0 0.0 0.0 10.25 4.0 1.0 25.000000 Home
我想添加一个新列,该列采用每个旧列并给出其x天移动平均值.但是,我想要每个唯一的人的移动平均线.例如,约翰·朗(John Long)可以在一个独特的日期玩几百场游戏.我希望他的移动平均数只反映他的表现.我已经看过了熊猫的df.rolling()函数,但我不知道如何制作它,因此它分别针对每个玩家.任何帮助将不胜感激.
I would like to add a new column that takes each of the old columns and gives its x day moving average. However, I want the moving average for each unique person. For example, John Long could play several hundred games each played on a unique date. I want his moving average numbers to reflect only his performances. I've looked at the df.rolling() function in pandas and I don't know how to make it so it looks at each player individually. Any help would be appreciated.
Name Date Points MA
0 Joe Smith 1-1-19 10 NA
1 Sam Simmons 1-1-19 20 NA
2 Joe Smith 1-2-19 30 20
3 Sam Simmons 1-2-19 40 30
推荐答案
Drawing inspiration from @jezrael's answer above, as well as the answer to another question here, here's a solution for running average by player - without the date window size constraint.
# Get the running count of Names, sorted by Date, Name
df['NameCount'] = df.sort_values(['Date','Name'], ascending=True).groupby('Name').cumcount() + 1
# Running sum of points, in the same order as above (important)
df['PointSum'] = df.sort_values(['Name','NameCount'], ascending=True).groupby('Name')['Points'].cumsum()
df['MA'] = df['PointSum']/df['NameCount']
# Drop the unneeded columns
df = df.drop(['NameCount', 'PointSum'], axis=1)
@MaxU提供的
cumcount() method provided by @MaxU here, as an emulation of the SQL's row number, partition by method
这篇关于如何使用需要唯一的列在 pandas 中执行移动均线?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!