如何在Pandas Dataframe中迭代计数 [英] How to iteratively count in Pandas Dataframe
问题描述
我想一直计数一个值,直到pandas/python中的当前位置为止.必须考虑一个条件(得分"列中必须存在数字,才能将游戏计为已玩游戏;如果我读取的excel文件中没有任何值,则显示为NaN).
I want to count all times a value occurred up to the present position in pandas/python. A condition must be accounted for (numbers must be present in 'score' column to count a game as played; shows up as NaN if no value in excel file I read from).
下面的代码是我所在的位置:
Code below is where i'm at:
import pandas as pd
df = pd.read_excel('G:\Project\SOQ1.xlsx')
df['date'] = pd.to_datetime(df['date'])
df = df.sort(columns='date')
df = df.set_index('date')
def calc_all_count(team_name):
home_count = df['home'].value_counts().get(team_name, 0)
away_count = df['away'].value_counts().get(team_name, 0)
all_count = home_count + away_count
return all_count
def calc_home_count(team_name):
home_count = df['home'].value_counts().get(team_name, 0)
return home_count
def calc_away_count(team_name):
away_count = df['away'].value_counts().get(team_name, 0)
return away_count
df['hag'] = df['home'].map(calc_all_count)
df['aag'] = df['away'].map(calc_all_count)
df['hahg'] = df['home'].map(calc_home_count)
df['aaag'] = df['away'].map(calc_away_count)
print df
league home away hscore ascore hag aag hahg aaag
date
2015-01-03 03:02:00 MLB Cle Tex 9 6 3 15 2 7
2015-05-10 03:03:00 MLB Bos Cle 6 7 16 3 7 1
2015-10-15 03:00:00 MLB Tex Bos 5 2 15 16 8 9
2015-10-15 03:30:00 MLB Tex Bos 1 6 15 16 8 9
2015-10-16 00:00:00 MLB Tex Bos 4 4 15 16 8 9
2015-10-17 03:30:00 MLB Bos Tex 2 8 16 15 7 7
2015-10-18 00:00:00 MLB Tex Bos 9 10 15 16 8 9
2015-10-20 00:00:00 MLB Bos Tex 2 3 16 15 7 7
2015-10-21 00:00:00 MLB Tex Bos 5 1 15 16 8 9
2015-10-22 03:00:00 MLB Tex Bos 5 3 15 16 8 9
2015-10-23 00:00:00 MLB Bos Tex 3 4 16 15 7 7
2015-10-25 23:00:00 MLB Bos Tex 6 6 16 15 7 7
2015-10-25 23:00:00 MLB Bos Tex 5 1 16 15 7 7
2015-10-26 00:00:00 MLB Tex Bos 9 6 15 16 8 9
2015-10-27 01:30:00 MLB Bos Tex 10 5 16 15 7 7
2015-10-28 01:00:00 MLB Tex Bos NaN NaN 15 16 8 9
2015-11-20 03:01:00 MLB Cle Bos NaN NaN 3 16 2 9
我想要的是每个游戏之前没有玩过的游戏.因此,由于没有人玩过,因此第一局/每行的所有数字都应读为0.应该看起来像这样:
What I want is for the no of games played prior to each game. So first game/row should read 0 for all numbers as nobodys played yet. Should look like this:
league home away hscore ascore hag aag hahg aaag
date
2015-01-03 03:02:00 MLB Cle Tex 9 6 0 0 0 0
2015-05-10 03:03:00 MLB Bos Cle 6 7 0 1 0 0
2015-10-15 03:00:00 MLB Tex Bos 5 2 1 1 0 0
2015-10-15 03:30:00 MLB Tex Bos 1 6 2 2 1 1
2015-10-16 00:00:00 MLB Tex Bos 4 4 3 3 2 2
2015-10-17 03:30:00 MLB Bos Tex 2 8 4 4 1 1
2015-10-18 00:00:00 MLB Tex Bos 9 10 5 5 3 3
2015-10-20 00:00:00 MLB Bos Tex 2 3 6 6 2 2
2015-10-21 00:00:00 MLB Tex Bos 5 1 7 7 4 4
2015-10-22 03:00:00 MLB Tex Bos 5 3 8 8 5 5
2015-10-23 00:00:00 MLB Bos Tex 3 4 9 9 3 3
2015-10-25 23:00:00 MLB Bos Tex 6 6 10 10 4 4
2015-10-25 23:00:00 MLB Bos Tex 5 1 11 11 5 5
2015-10-26 00:00:00 MLB Tex Bos 9 6 12 12 6 6
2015-10-27 01:30:00 MLB Bos Tex 10 5 13 13 6 6
2015-10-28 01:00:00 MLB Tex Bos NaN NaN 14 14 7 7
2015-11-20 03:01:00 MLB Cle Bos NaN NaN 2 14 1 7
我该如何计算之前"的当前位置?我想我应该使用.iloc或.ix,但我无法弄清楚.
How can I structure to count 'before' present position? I am thinking I should be using .iloc or .ix but I can't figure it out.
对帮助实现此代码或更好代码的任何帮助表示赞赏.提出这个问题的技巧也得到了更好的理解.
Any help on achieving this or better code appreciated. Tips on asking the question better appreciated too.
推荐答案
我的方法不使用map
,但是功能 groupby
, pivot_table
, merge
和 cumsum
:
My approach don't use map
, but functions stack
, groupby
, pivot_table
, merge
and cumsum
:
df.sort_values(by='date', axis=0, inplace = True)
#set helper column for counting cumsum
df['one'] = 1
print df
# date league home away hscore ascore one
#5 2015-01-03 03:02:00 MLB Cle Tex 9 6 1
#0 2015-05-10 03:03:00 MLB Bos Cle 6 7 1
#1 2015-10-15 03:00:00 MLB Tex Bos 5 2 1
#2 2015-10-15 03:30:00 MLB Tex Bos 1 6 1
#3 2015-10-16 00:00:00 MLB Tex Bos 4 4 1
#4 2015-10-17 03:30:00 MLB Bos Tex 2 8 1
#6 2015-10-18 00:00:00 MLB Tex Bos 9 10 1
#7 2015-10-20 00:00:00 MLB Bos Tex 2 3 1
#8 2015-10-21 00:00:00 MLB Tex Bos 5 1 1
#9 2015-10-22 03:00:00 MLB Tex Bos 5 3 1
#10 2015-10-23 00:00:00 MLB Bos Tex 3 4 1
#11 2015-10-25 23:00:00 MLB Bos Tex 6 6 1
#12 2015-10-25 23:00:00 MLB Bos Tex 5 1 1
#13 2015-10-26 00:00:00 MLB Tex Bos 9 6 1
#14 2015-10-27 01:30:00 MLB Bos Tex 10 5 1
#15 2015-10-28 01:00:00 MLB Tex Bos NaN NaN 1
#16 2015-11-20 03:01:00 MLB Cle Bos NaN NaN 1
#set columns home and away to one columns for cumsum
df2 = df[['date', 'home', 'away', 'one']].set_index(['date', 'one'])
df2 = df2.stack().reset_index(name="both")
df2['new'] = df2.groupby(['both'])['one'].cumsum() - 1
#print df2
#get back to original index
df1 = pd.pivot_table(df2, index=['date'], columns=['level_2'], values='new').reset_index()
#print df1
#merge with original df
df1 = pd.merge(df, df1, on=['date'], suffixes=('', '_new'))
#rename and casr float columns to integers columns
df1 = df1.rename(columns={'away_new':'aag', 'home_new':'hag',})
df1['aag'] = df1['aag'].astype(int)
df1['hag'] = df1['hag'].astype(int)
#count aaag and hahg
df1['aaag'] = df1.groupby(['away'])['one'].cumsum() - 1
df1['hahg'] = df1.groupby(['home'])['one'].cumsum() - 1
#drop helper column one and set index
df1 = df1.drop(['one'], axis=1 ).set_index('date')
#reorder columns
df1 = df1[['league', 'home', 'away', 'hscore', 'ascore', 'hag', 'aag', 'hahg', 'aaag']]
print df1
league home away hscore ascore hag aag hahg aaag
date
2015-01-03 03:02:00 MLB Cle Tex 9 6 0 0 0 0
2015-05-10 03:03:00 MLB Bos Cle 6 7 0 1 0 0
2015-10-15 03:00:00 MLB Tex Bos 5 2 1 1 0 0
2015-10-15 03:30:00 MLB Tex Bos 1 6 2 2 1 1
2015-10-16 00:00:00 MLB Tex Bos 4 4 3 3 2 2
2015-10-17 03:30:00 MLB Bos Tex 2 8 4 4 1 1
2015-10-18 00:00:00 MLB Tex Bos 9 10 5 5 3 3
2015-10-20 00:00:00 MLB Bos Tex 2 3 6 6 2 2
2015-10-21 00:00:00 MLB Tex Bos 5 1 7 7 4 4
2015-10-22 03:00:00 MLB Tex Bos 5 3 8 8 5 5
2015-10-23 00:00:00 MLB Bos Tex 3 4 9 9 3 3
2015-10-25 23:00:00 MLB Bos Tex 6 6 10 10 4 4
2015-10-25 23:00:00 MLB Bos Tex 5 1 10 10 5 5
2015-10-26 00:00:00 MLB Tex Bos 9 6 12 12 6 6
2015-10-27 01:30:00 MLB Bos Tex 10 5 13 13 6 6
2015-10-28 01:00:00 MLB Tex Bos NaN NaN 14 14 7 7
2015-11-20 03:01:00 MLB Cle Bos NaN NaN 2 15 1 8
这篇关于如何在Pandas Dataframe中迭代计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!