汇总数据并获取总数和数量 [英] Aggregating data and getting sum and counts

查看:111
本文介绍了汇总数据并获取总数和数量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

b

  Team1 Player1 idTrip13 133 
Team2 Player333 idTrip10 18373
Team3 Player22 idTrip12 17338899
Team2 Player293 idTrip02 17656
Team3 Player20 idTrip11 1883
Team1 Player1 idTrip19 19393

我需要汇总这些数据(如数据透视表)。



输出我正在处理:

  Team1 Player1:2次旅行:总计(133 + 19393)
Team2 Player333:1次行程:18373;球员293:1次之旅:17656
球队3球员22:1次之旅:17338899; Player20:1行程:1883

有人可以在Python中使用适当的对象来使用,以下输出?

 打印团队,玩家,车次,时间


解决方案

使用groupby函数 pandas DataFrames


  1. 将数据放入列表列表中,每个内部列表将成为数据框中的一行。

      In [1]:

    mydata = [['Team1','Player1','idTrip13',133],['Team2','Player333','idTrip10',18373],
    ['Team3' ,'Player22','idTrip12',17338899],['Team2','Player293','idTrip02',17656],
    ['Team3','Player20','idTrip11',1883] Team1','Player1','idTrip19',19393]]

    df = pd.DataFrame(mydata,columns = ['队','球员','旅行','时间'])

    df
    出局[1]:
    团队球员出场时间
    0 Team1 Player1 idTrip13 133
    1 Team2 Player333 idTrip10 18373
    2 Team3 Player22 idTrip12 17338899
    3 Team2 Player293 idTrip02 17656
    4 Team3 Player20 idTrip11 1883
    5 Team1 Player1 idTrip19 19393


  2. 调用 groupby() ,传递你想要的列用作您的石斑鱼,
    并将一个函数应用到组中。







  3. 示例



    1 找出每个团队进行的旅行次数。 团队是石斑鱼,并且我们在列 ['旅行中使用函数 count() ']

      In [2]:
    trip_count = df.groupby(by = ['team'])['trips']。count()

    trip_count
    输出[2]:


    Team1 2
    Team2 2
    Team3 2
    名称:trips,dtype:int64

    实施例。 2(多列):查找每个队员花在旅行上的总时间。我们使用2列 ['team','player'] 作为分组,并应用函数 sum() ['time']

     在[3]中: 
    trip_time = df.groupby(by = ['team','player'])['time']。sum()

    trip_time
    Out [3]:

    球队球员
    球队1球员1 19526
    球队2球员293 17656 $ b $球员333 18373
    球队3球员20 1883
    球员22 17338899
    名称:时间, dtype:int64

    3 (多种功能)

    :对于团队中的每个玩家,查找旅行的总次数和花费的总时间。

    $ player_total = ('''''''''')'$'输出[4]:
    旅行时间
    团队运动员
    Team1 Player1 2 19526
    Team2 Player293 1 17656
    Player333 1 18373
    Team3 Player20 1 1883
    Player22 1 17338899


    I have an object in python with a lot of rows:

    INPUT :

        Team1     Player1     idTrip13     133
        Team2     Player333   idTrip10     18373
        Team3     Player22    idTrip12     17338899
        Team2     Player293   idTrip02     17656
        Team3     Player20    idTrip11     1883
        Team1     Player1     idTrip19     19393
    

    and I need to aggregate this data (like a pivot table).

    OUTPUT I am working on:

    Team1   Player1 : 2 trips : sum(133+19393)
    Team2   Player333 : 1 trip : 18373; Player293 : 1 trip : 17656
    Team3   Player22 : 1 trip : 17338899; Player20 : 1 trip : 1883
    

    Could someone suggest the appropriate object in Python to use such that I could have the following output?

    print team, player, trips, time
    

    解决方案

    Use groupby function for pandas DataFrames

    1. Put your data into a list of lists, each inner list will be a row in the dataframe.

      In[1]:
      
      mydata = [['Team1', 'Player1', 'idTrip13', 133], ['Team2', 'Player333', 'idTrip10', 18373],
      ['Team3', 'Player22', 'idTrip12', 17338899], ['Team2', 'Player293','idTrip02', 17656], 
      ['Team3', 'Player20', 'idTrip11', 1883], ['Team1', 'Player1', 'idTrip19', 19393]]
      
      df = pd.DataFrame(mydata, columns = ['team', 'player', 'trips', 'time'])
      
      df
      Out[1]:
           team    player       trips      time
      0   Team1   Player1     idTrip13    133
      1   Team2   Player333   idTrip10    18373
      2   Team3   Player22    idTrip12    17338899
      3   Team2   Player293   idTrip02    17656
      4   Team3   Player20    idTrip11    1883
      5   Team1   Player1     idTrip19    19393
      

    2. Call groupby(), pass the column you wish to use as your grouper, and apply a function to the groups.


    Examples

    Ex. 1 Find the number of trips each team went on. team is the grouper, and we apply the function count() on column ['trips'].

    In[2]:
    trip_count = df.groupby(by = ['team'])['trips'].count() 
    
    trip_count              
    Out[2]:          
    
     team
    Team1    2
    Team2    2
    Team3    2
    Name: trips, dtype: int64
    

    Ex. 2 (multiple columns): Find the total time each player on a team spent traveling. We use 2 columns ['team', 'player'] as the grouper, and apply the function sum() on column ['time'].

    In[3]:              
    trip_time = df.groupby(by = ['team', 'player'])['time'].sum() 
    
    trip_time        
    Out[3]:
    
     team   player   
    Team1  Player1         19526
    Team2  Player293       17656
           Player333       18373
    Team3  Player20         1883
           Player22     17338899
    Name: time, dtype: int64
    

    Ex. 3 (multiple functions): For each player on a team, find the total number of trips and total time spent traveling.

    player_total = df.groupby(by = ['team', 'player']).agg({'time' : 'sum', 'trips' : 'count'})
    
    player_total
    Out[4]:
                     trips  time
    team    player      
    Team1   Player1     2   19526
    Team2   Player293   1   17656
            Player333   1   18373
    Team3   Player20    1   1883
            Player22    1   17338899
    

    这篇关于汇总数据并获取总数和数量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆