如何在python pandas中从具有某些条件的一列中计算总和? [英] How to calculate sum from one column with some coditions in python pandas?

查看:114
本文介绍了如何在python pandas中从具有某些条件的一列中计算总和?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个熊猫数据框,如下所示:

I have one pandas Dataframe which looks like below:

df = pd.DataFrame({'sport_name': ['football','football','football','football','football','cricket','cricket','cricket','cricket'],
       'person_name': ['ramesh','ramesh','ramesh','ramesh','mohit','mahesh','mahesh','mahesh','mahesh'],
           'city': ['mumbai', 'mumbai','delhi','delhi','pune','surat','surat','panji','panji'],
    'person_symbol': ['ram','mum','mum','ram','moh','mah','sur','sur','mah'],
    'person_count': ['10','14','25','20','11','34','23','43','34']})
df = df[['sport_name','person_name','city','person_symbol','person_count']]
print df

  sport_name person_name    city person_symbol person_count
0   football      ramesh  mumbai           ram           10
1   football      ramesh  mumbai           mum           14
2   football      ramesh   delhi           mum           25
3   football      ramesh   delhi           ram           20
4   football       mohit    pune           moh           11
5    cricket      mahesh   surat           mah           34
6    cricket      mahesh   surat           sur           23
7    cricket      mahesh   panji           sur           43
8    cricket      mahesh   panji           mah           34

从这个数据框中,我想创建两列数据框,分别命名为" derived_symbol "和" person_count ".为了创建它,我需要关注以下一些条件:

From this Dataframe, I want to create two column dataframe named as "derived_symbol" and "person_count". For creating it I need to focus on some condition like below:

  • derived_symbol需要为每个唯一的城市和person_symbol形成.
  • person_count是基于derived_symbol是什么来计算的.

示例:

考虑第一组数据框,即sport_name = football和person_name = rakesh,这是四行:

Consider First set of Dataframe i.e. sport_name = football and person_name = rakesh Which are four rows:

  sport_name person_name    city person_symbol person_count
0   football      ramesh  mumbai           ram           10
1   football      ramesh  mumbai           mum           14
2   football      ramesh   delhi           mum           25
3   football      ramesh   delhi           ram           20

在上面的四行中,如果我们查看city和person_symbol列值,我们可以看到四个不同的唯一值可用,分别是孟买,德里,拉姆和妈妈.因此,对于以上四行,我们的输出如下所示:

In this above four rows, If we look at the city and person_symbol column values we can see four different unique values are available which are mumbai, del ram and mum. so for this above four rows our output looks like below:

derived_symbol                     person_count
football.ramesh.TOTAL.mumbai_count  24
football.ramesh.TOTAL.delhi_count   45
football.ramesh.TOTAL.ram_count     30
football.ramesh.TOTAL.mum_count     39

最终预期输出:

derived_symbol                      person_count
football.ramesh.TOTAL.mumbai_count  24
football.ramesh.TOTAL.delhi_count   45
football.ramesh.TOTAL.ram_count     30
football.ramesh.TOTAL.mum_count     39
football.mohit.TOTAL.pune_count     11
football.mohit.TOTAL.moh_count      11
cricket.mahesh.TOTAL.surat_count    57
cricket.mahesh.TOTAL.panji_count    77
cricket.mahesh.TOTAL.sur_count      66
cricket.mahesh.TOTAL.mah_count      68

数据框的日期也如下所示:

Edit : Dataframe has dates too like below:

df = pd.DataFrame({'sport_name': ['football','football','football','football','football','football','football','football'],
       'person_name': ['ramesh','ramesh','ramesh','ramesh','ramesh','ramesh','ramesh','ramesh'],
           'city': ['mumbai', 'mumbai','delhi','delhi','mumbai', 'mumbai','delhi','delhi'],
    'person_symbol': ['ram','mum','mum','ram','ram','mum','mum','ram'],
    'person_count': ['10','14','25','20','34','23','43','34'],
    'month': ['2017-01-23','2017-01-23','2017-01-23','2017-01-23','2017-02-26','2017-02-26','2017-02-26','2017-02-26']})
df = df[['sport_name','person_name','city','person_symbol','person_count','month']]
print df

  sport_name person_name    city person_symbol person_count       month
0   football      ramesh  mumbai           ram           10  2017-01-23
1   football      ramesh  mumbai           mum           14  2017-01-23
2   football      ramesh   delhi           mum           25  2017-01-23
3   football      ramesh   delhi           ram           20  2017-01-23
4   football      ramesh  mumbai           ram           34  2017-02-26
5   football      ramesh  mumbai           mum           23  2017-02-26
6   football      ramesh   delhi           mum           43  2017-02-26
7   football      ramesh   delhi           ram           34  2017-02-26

预期输出:

derived_symbol              person_count    month
football.ramesh.TOTAL.mumbai_count  24      2017-01-23
football.ramesh.TOTAL.delhi_count   45      2017-01-23
football.ramesh.TOTAL.ram_count     30      2017-01-23
football.ramesh.TOTAL.mum_count     39      2017-01-23
football.ramesh.TOTAL.mumbai_count  57      2017-02-26
football.ramesh.TOTAL.delhi_count   77      2017-02-26
football.ramesh.TOTAL.ram_count     68      2017-02-26
football.ramesh.TOTAL.mum_count     66      2017-02-26

我按照以下方式计算了一个月的person_count:

I did following to calculate person_count month wise:

df = pd.DataFrame({'sport_name': ['football','football','football','football','football','football','football','football'],
           'person_name': ['ramesh','ramesh','ramesh','ramesh','ramesh','ramesh','ramesh','ramesh'],
               'city': ['mumbai', 'mumbai','delhi','delhi','mumbai', 'mumbai','delhi','delhi'],
        'person_symbol': ['ram','mum','mum','ram','ram','mum','mum','ram'],
        'person_count': ['10','14','25','20','34','23','43','34'],
        'month': ['2017-01-23','2017-01-23','2017-01-23','2017-01-23','2017-02-26','2017-02-26','2017-02-26','2017-02-26']})
df = df[['sport_name','person_name','city','person_symbol','person_count','month']]

df['person_count'] = df['person_count'].astype(int)

df1=df.set_index(['sport_name','person_name','person_count','month']).stack().reset_index(name='val')

df1['derived_symbol'] = df1['sport_name'] + '.' + df1['person_name'] + '.TOTAL.' + df1['val'] + '_count'

df2 = df1.groupby(['derived_symbol','month'])['person_count'].sum().reset_index(name='person_count')
print (df2)

推荐答案

使用:

#convert column to int
df['person_count'] = df['person_count'].astype(int)

#reshape for one column from city and person_symbol columns 
df1=df.set_index(['sport_name','person_name','person_count']).stack().reset_index(name='val')
print (df1)
   sport_name person_name  person_count        level_3     val
0    football      ramesh            10           city  mumbai
1    football      ramesh            10  person_symbol     ram
2    football      ramesh            14           city  mumbai
3    football      ramesh            14  person_symbol     mum
4    football      ramesh            25           city   delhi
5    football      ramesh            25  person_symbol     mum
6    football      ramesh            20           city   delhi
7    football      ramesh            20  person_symbol     ram
8    football       mohit            11           city    pune
9    football       mohit            11  person_symbol     moh
10    cricket      mahesh            34           city   surat
11    cricket      mahesh            34  person_symbol     mah
12    cricket      mahesh            23           city   surat
13    cricket      mahesh            23  person_symbol     sur
14    cricket      mahesh            43           city   panji
15    cricket      mahesh            43  person_symbol     sur
16    cricket      mahesh            34           city   panji
17    cricket      mahesh            34  person_symbol     mah


#concatenate columns
a = df1['sport_name'] + '.' + df1['person_name'] + '.TOTAL.' + df1['val'] + '_count'

#groupby by Series a and aggregate sum
df2 = df1['person_count'].groupby(a.rename('derived_symbol'), sort=False)
                         .sum()
                         .reset_index(name='person_count')
print (df2)
                       derived_symbol  person_count
0  football.ramesh.TOTAL.mumbai_count            24
1     football.ramesh.TOTAL.ram_count            30
2     football.ramesh.TOTAL.mum_count            39
3   football.ramesh.TOTAL.delhi_count            45
4     football.mohit.TOTAL.pune_count            11
5      football.mohit.TOTAL.moh_count            11
6    cricket.mahesh.TOTAL.surat_count            57
7      cricket.mahesh.TOTAL.mah_count            68
8      cricket.mahesh.TOTAL.sur_count            66
9    cricket.mahesh.TOTAL.panji_count            77

这篇关于如何在python pandas中从具有某些条件的一列中计算总和?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆