如何在Pandas Dataframe中迭代计数 [英] How to iteratively count in Pandas Dataframe

查看:91
本文介绍了如何在Pandas Dataframe中迭代计数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想一直计数一个值,直到pandas/python中的当前位置为止.必须考虑一个条件(得分"列中必须存在数字,才能将游戏计为已玩游戏;如果我读取的excel文件中没有任何值,则显示为NaN).

I want to count all times a value occurred up to the present position in pandas/python. A condition must be accounted for (numbers must be present in 'score' column to count a game as played; shows up as NaN if no value in excel file I read from).

下面的代码是我所在的位置:

Code below is where i'm at:

import pandas as pd


df = pd.read_excel('G:\Project\SOQ1.xlsx')

df['date'] = pd.to_datetime(df['date'])

df = df.sort(columns='date')

df = df.set_index('date')

def calc_all_count(team_name):  
    home_count = df['home'].value_counts().get(team_name, 0)
    away_count = df['away'].value_counts().get(team_name, 0)
    all_count = home_count + away_count
    return all_count

def calc_home_count(team_name):    
    home_count = df['home'].value_counts().get(team_name, 0)
    return home_count

def calc_away_count(team_name):
    away_count = df['away'].value_counts().get(team_name, 0)
    return away_count

df['hag'] = df['home'].map(calc_all_count)
df['aag'] = df['away'].map(calc_all_count)
df['hahg'] = df['home'].map(calc_home_count)
df['aaag'] = df['away'].map(calc_away_count)

print df


                    league home away  hscore  ascore  hag  aag  hahg  aaag
date                                                                      
2015-01-03 03:02:00    MLB  Cle  Tex       9       6    3   15     2     7
2015-05-10 03:03:00    MLB  Bos  Cle       6       7   16    3     7     1
2015-10-15 03:00:00    MLB  Tex  Bos       5       2   15   16     8     9
2015-10-15 03:30:00    MLB  Tex  Bos       1       6   15   16     8     9
2015-10-16 00:00:00    MLB  Tex  Bos       4       4   15   16     8     9
2015-10-17 03:30:00    MLB  Bos  Tex       2       8   16   15     7     7
2015-10-18 00:00:00    MLB  Tex  Bos       9      10   15   16     8     9
2015-10-20 00:00:00    MLB  Bos  Tex       2       3   16   15     7     7
2015-10-21 00:00:00    MLB  Tex  Bos       5       1   15   16     8     9
2015-10-22 03:00:00    MLB  Tex  Bos       5       3   15   16     8     9
2015-10-23 00:00:00    MLB  Bos  Tex       3       4   16   15     7     7
2015-10-25 23:00:00    MLB  Bos  Tex       6       6   16   15     7     7
2015-10-25 23:00:00    MLB  Bos  Tex       5       1   16   15     7     7
2015-10-26 00:00:00    MLB  Tex  Bos       9       6   15   16     8     9
2015-10-27 01:30:00    MLB  Bos  Tex      10       5   16   15     7     7
2015-10-28 01:00:00    MLB  Tex  Bos     NaN     NaN   15   16     8     9
2015-11-20 03:01:00    MLB  Cle  Bos     NaN     NaN    3   16     2     9

我想要的是每个游戏之前没有玩过的游戏.因此,由于没有人玩过,因此第一局/每行的所有数字都应读为0.应该看起来像这样:

What I want is for the no of games played prior to each game. So first game/row should read 0 for all numbers as nobodys played yet. Should look like this:

                    league home away  hscore  ascore  hag  aag  hahg  aaag
date                                                                      
2015-01-03 03:02:00    MLB  Cle  Tex       9       6    0    0     0     0
2015-05-10 03:03:00    MLB  Bos  Cle       6       7    0    1     0     0
2015-10-15 03:00:00    MLB  Tex  Bos       5       2    1    1     0     0
2015-10-15 03:30:00    MLB  Tex  Bos       1       6    2    2     1     1
2015-10-16 00:00:00    MLB  Tex  Bos       4       4    3    3     2     2
2015-10-17 03:30:00    MLB  Bos  Tex       2       8    4    4     1     1
2015-10-18 00:00:00    MLB  Tex  Bos       9      10    5    5     3     3
2015-10-20 00:00:00    MLB  Bos  Tex       2       3    6    6     2     2
2015-10-21 00:00:00    MLB  Tex  Bos       5       1    7    7     4     4
2015-10-22 03:00:00    MLB  Tex  Bos       5       3    8    8     5     5
2015-10-23 00:00:00    MLB  Bos  Tex       3       4    9    9     3     3
2015-10-25 23:00:00    MLB  Bos  Tex       6       6   10   10     4     4
2015-10-25 23:00:00    MLB  Bos  Tex       5       1   11   11     5     5
2015-10-26 00:00:00    MLB  Tex  Bos       9       6   12   12     6     6
2015-10-27 01:30:00    MLB  Bos  Tex      10       5   13   13     6     6
2015-10-28 01:00:00    MLB  Tex  Bos     NaN     NaN   14   14     7     7
2015-11-20 03:01:00    MLB  Cle  Bos     NaN     NaN    2   14     1     7

我该如何计算之前"的当前位置?我想我应该使用.iloc或.ix,但我无法弄清楚.

How can I structure to count 'before' present position? I am thinking I should be using .iloc or .ix but I can't figure it out.

对帮助实现此代码或更好代码的任何帮助表示赞赏.提出这个问题的技巧也得到了更好的理解.

Any help on achieving this or better code appreciated. Tips on asking the question better appreciated too.

推荐答案

我的方法不使用map,但是功能 pivot_table merge cumsum :

My approach don't use map, but functions stack, groupby, pivot_table, merge and cumsum:

df.sort_values(by='date', axis=0, inplace = True)
#set helper column for counting cumsum
df['one'] = 1
print df
#                   date league home away  hscore  ascore  one
#5   2015-01-03 03:02:00    MLB  Cle  Tex       9       6    1
#0   2015-05-10 03:03:00    MLB  Bos  Cle       6       7    1
#1   2015-10-15 03:00:00    MLB  Tex  Bos       5       2    1
#2   2015-10-15 03:30:00    MLB  Tex  Bos       1       6    1
#3   2015-10-16 00:00:00    MLB  Tex  Bos       4       4    1
#4   2015-10-17 03:30:00    MLB  Bos  Tex       2       8    1
#6   2015-10-18 00:00:00    MLB  Tex  Bos       9      10    1
#7   2015-10-20 00:00:00    MLB  Bos  Tex       2       3    1
#8   2015-10-21 00:00:00    MLB  Tex  Bos       5       1    1
#9   2015-10-22 03:00:00    MLB  Tex  Bos       5       3    1
#10  2015-10-23 00:00:00    MLB  Bos  Tex       3       4    1
#11  2015-10-25 23:00:00    MLB  Bos  Tex       6       6    1
#12  2015-10-25 23:00:00    MLB  Bos  Tex       5       1    1
#13  2015-10-26 00:00:00    MLB  Tex  Bos       9       6    1
#14  2015-10-27 01:30:00    MLB  Bos  Tex      10       5    1
#15  2015-10-28 01:00:00    MLB  Tex  Bos     NaN     NaN    1
#16  2015-11-20 03:01:00    MLB  Cle  Bos     NaN     NaN    1

#set columns home and away to one columns for cumsum
df2 = df[['date', 'home', 'away', 'one']].set_index(['date', 'one'])
df2 = df2.stack().reset_index(name="both")
df2['new'] =  df2.groupby(['both'])['one'].cumsum() - 1
#print df2

#get back to original index
df1 = pd.pivot_table(df2, index=['date'], columns=['level_2'], values='new').reset_index()
#print df1

#merge with original df
df1 = pd.merge(df, df1, on=['date'], suffixes=('', '_new'))

#rename and casr float columns to integers columns
df1 = df1.rename(columns={'away_new':'aag', 'home_new':'hag',})
df1['aag'] = df1['aag'].astype(int)
df1['hag'] = df1['hag'].astype(int)

#count aaag and hahg
df1['aaag'] =  df1.groupby(['away'])['one'].cumsum() - 1
df1['hahg'] =  df1.groupby(['home'])['one'].cumsum() - 1

#drop helper column one and set index
df1 = df1.drop(['one'], axis=1 ).set_index('date')
#reorder columns
df1 = df1[['league', 'home', 'away', 'hscore', 'ascore', 'hag', 'aag', 'hahg', 'aaag']]

print df1
                    league home away  hscore  ascore  hag  aag  hahg  aaag
date                                                                      
2015-01-03 03:02:00    MLB  Cle  Tex       9       6    0    0     0     0
2015-05-10 03:03:00    MLB  Bos  Cle       6       7    0    1     0     0
2015-10-15 03:00:00    MLB  Tex  Bos       5       2    1    1     0     0
2015-10-15 03:30:00    MLB  Tex  Bos       1       6    2    2     1     1
2015-10-16 00:00:00    MLB  Tex  Bos       4       4    3    3     2     2
2015-10-17 03:30:00    MLB  Bos  Tex       2       8    4    4     1     1
2015-10-18 00:00:00    MLB  Tex  Bos       9      10    5    5     3     3
2015-10-20 00:00:00    MLB  Bos  Tex       2       3    6    6     2     2
2015-10-21 00:00:00    MLB  Tex  Bos       5       1    7    7     4     4
2015-10-22 03:00:00    MLB  Tex  Bos       5       3    8    8     5     5
2015-10-23 00:00:00    MLB  Bos  Tex       3       4    9    9     3     3
2015-10-25 23:00:00    MLB  Bos  Tex       6       6   10   10     4     4
2015-10-25 23:00:00    MLB  Bos  Tex       5       1   10   10     5     5
2015-10-26 00:00:00    MLB  Tex  Bos       9       6   12   12     6     6
2015-10-27 01:30:00    MLB  Bos  Tex      10       5   13   13     6     6
2015-10-28 01:00:00    MLB  Tex  Bos     NaN     NaN   14   14     7     7
2015-11-20 03:01:00    MLB  Cle  Bos     NaN     NaN    2   15     1     8

这篇关于如何在Pandas Dataframe中迭代计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆