过去 n 天的平均值 pandas [英] Average values in last n days pandas
问题描述
我有一个关于高尔夫球手及其在各种锦标赛中的高尔夫球回合的数据框(请参阅下面发布的 df head 字典).我需要一种快速计算方法,对于玩家进行的每一轮比赛,他在前 n 天的平均获得的击球数"(SG),其中 n 是我决定的任何值.我会知道如何通过将数据帧转换为列表列表并迭代来做到这一点,但这会非常慢.理想情况下,我希望在 Pandas df 中有一个额外的列,标题为过去 100 天玩家的平均 SG".
I've got a dataframe of golfers and their golf rounds in various tournaments (see dictionary of df head posted below). I need a fast way of computing, for each round the player plays, his average 'strokes gained' (SG) over the previous n days, where n is any value I decide. I would know how to do this by converting the dataframe into a list of lists and iterating through but that would be very slow. Ideally I want an extra column in the Pandas df titled 'Player's average SG in last 100 days'.
这是我们正在使用的(数据帧头的字典):
This is what we're working with (dict of dataframe head):
{'Avg SG Player': {0: 0.4564491861877877,
1: -0.170952417298073,
2: 1.509033309098962,
3: -1.7298114700775877,
4: 1.7856746598995106},
'Avg Score': {0: 69.53846153846153,
1: 69.53846153846153,
2: 69.53846153846153,
3: 69.53846153846153,
4: 69.53846153846153},
'Date': {0: Timestamp('2003-01-23 00:00:00'),
1: Timestamp('2003-01-23 00:00:00'),
2: Timestamp('2003-01-23 00:00:00'),
3: Timestamp('2003-01-23 00:00:00'),
4: Timestamp('2003-01-23 00:00:00')},
'Field Strength': {0: 0.08871540761770776,
1: 0.08871540761770776,
2: 0.08871540761770776,
3: 0.08871540761770776,
4: 0.08871540761770776},
'Ind': {0: 0, 1: 1, 2: 2, 3: 3, 4: 4},
'Overall SG': {0: 7.627176946079241,
1: 5.627176946079241,
2: 5.627176946079241,
3: 4.627176946079241,
4: 4.627176946079241},
'Player': {0: 'Harrison Frazar',
1: 'John Huston',
2: 'David Toms',
3: 'James H. McLean',
4: 'Luke Donald'},
'Round': {0: 'R1', 1: 'R1', 2: 'R1', 3: 'R1', 4: 'R1'},
'Rounds Played': {0: 270, 1: 209, 2: 228, 3: 28, 4: 221},
'SG on Field': {0: 7.538461538461533,
1: 5.538461538461533,
2: 5.538461538461533,
3: 4.538461538461533,
4: 4.538461538461533},
'Score': {0: 62, 1: 64, 2: 64, 3: 65, 4: 65},
'Tourn-Round': {0: '2003 Phoenix OpenR1',
1: '2003 Phoenix OpenR1',
2: '2003 Phoenix OpenR1',
3: '2003 Phoenix OpenR1',
4: '2003 Phoenix OpenR1'},
'Tournament': {0: '2003 Phoenix Open',
1: '2003 Phoenix Open',
2: '2003 Phoenix Open',
3: '2003 Phoenix Open',
4: '2003 Phoenix Open'}}
已编辑
Dataframe 本质上是这样的:
Dataframe is essentially this:
球员-获得回合的日期(当天)
Player-Date of Round-Strokes Gained (on that day)
T 伍兹 - 01-01-2010 - 5.4
T Woods - 01-01-2010 - 5.4
R 麦克罗伊 - 01-01-2010 - 3.8
R McIlroy - 01-01-2010 - 3.8
T 伍兹 - 02-01-2010 - 0.4
T Woods - 02-01-2010 - 0.4
等
有 350,000 行.我需要的是一个额外的列,给出该球员在当前回合日期前 n(比如 100)天的平均击球数.
There are 350,000 rows. What I require is an extra column giving the average strokes gained for the player in question over the n (say 100) days prior to the date of his current round.
所以如果下一行是:
球员-日期-中风获得(当天)
Player-Date-Strokes Gained (on that day)
T 伍兹 - 20-01-2018 - 3.2
T Woods - 20-01-2018 - 3.2
我希望第四(新)列,称为100 天平均值",为 2.9 ((5.4+0.4)/2),因为这是 Tiger 在定义的前两轮的平均值时间跨度.
I would want the fourth (new) column, call it '100 Day Average', to be 2.9 ((5.4+0.4)/2) because that is the average of the two previous rounds by Tiger that are in the defined timespan.
谢谢,
汤姆
推荐答案
这应该有效:
n = 10000
start_date = pd.to_datetime('today') - pd.Timedelta(n, unit='D')
df[df['Date'] >= start_date].groupby('Player')['Avg SG Player'].mean()
如果要输入开始日期和结束日期:
If you want to enter a start date and end date:
start_date = pd.to_datetime('2005-12-01')
end_date = pd.to_datetime('2015-12-01')
df[(df['Date'] >= start_date) & (df['Date'] <= end_date)].groupby('Player')['Avg SG Player'].mean()
这篇关于过去 n 天的平均值 pandas的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!