如何在python pandas中从具有某些条件的一列中计算总和? [英] How to calculate sum from one column with some coditions in python pandas?
问题描述
我有一个熊猫数据框,如下所示:
I have one pandas Dataframe which looks like below:
df = pd.DataFrame({'sport_name': ['football','football','football','football','football','cricket','cricket','cricket','cricket'],
'person_name': ['ramesh','ramesh','ramesh','ramesh','mohit','mahesh','mahesh','mahesh','mahesh'],
'city': ['mumbai', 'mumbai','delhi','delhi','pune','surat','surat','panji','panji'],
'person_symbol': ['ram','mum','mum','ram','moh','mah','sur','sur','mah'],
'person_count': ['10','14','25','20','11','34','23','43','34']})
df = df[['sport_name','person_name','city','person_symbol','person_count']]
print df
sport_name person_name city person_symbol person_count
0 football ramesh mumbai ram 10
1 football ramesh mumbai mum 14
2 football ramesh delhi mum 25
3 football ramesh delhi ram 20
4 football mohit pune moh 11
5 cricket mahesh surat mah 34
6 cricket mahesh surat sur 23
7 cricket mahesh panji sur 43
8 cricket mahesh panji mah 34
从这个数据框中,我想创建两列数据框,分别命名为" derived_symbol "和" person_count ".为了创建它,我需要关注以下一些条件:
From this Dataframe, I want to create two column dataframe named as "derived_symbol" and "person_count". For creating it I need to focus on some condition like below:
- derived_symbol需要为每个唯一的城市和person_symbol形成.
- person_count是基于derived_symbol是什么来计算的.
示例:
考虑第一组数据框,即sport_name = football和person_name = rakesh,这是四行:
Consider First set of Dataframe i.e. sport_name = football and person_name = rakesh Which are four rows:
sport_name person_name city person_symbol person_count
0 football ramesh mumbai ram 10
1 football ramesh mumbai mum 14
2 football ramesh delhi mum 25
3 football ramesh delhi ram 20
在上面的四行中,如果我们查看city和person_symbol列值,我们可以看到四个不同的唯一值可用,分别是孟买,德里,拉姆和妈妈.因此,对于以上四行,我们的输出如下所示:
In this above four rows, If we look at the city and person_symbol column values we can see four different unique values are available which are mumbai, del ram and mum. so for this above four rows our output looks like below:
derived_symbol person_count
football.ramesh.TOTAL.mumbai_count 24
football.ramesh.TOTAL.delhi_count 45
football.ramesh.TOTAL.ram_count 30
football.ramesh.TOTAL.mum_count 39
最终预期输出:
derived_symbol person_count
football.ramesh.TOTAL.mumbai_count 24
football.ramesh.TOTAL.delhi_count 45
football.ramesh.TOTAL.ram_count 30
football.ramesh.TOTAL.mum_count 39
football.mohit.TOTAL.pune_count 11
football.mohit.TOTAL.moh_count 11
cricket.mahesh.TOTAL.surat_count 57
cricket.mahesh.TOTAL.panji_count 77
cricket.mahesh.TOTAL.sur_count 66
cricket.mahesh.TOTAL.mah_count 68
数据框的日期也如下所示:
Edit : Dataframe has dates too like below:
df = pd.DataFrame({'sport_name': ['football','football','football','football','football','football','football','football'],
'person_name': ['ramesh','ramesh','ramesh','ramesh','ramesh','ramesh','ramesh','ramesh'],
'city': ['mumbai', 'mumbai','delhi','delhi','mumbai', 'mumbai','delhi','delhi'],
'person_symbol': ['ram','mum','mum','ram','ram','mum','mum','ram'],
'person_count': ['10','14','25','20','34','23','43','34'],
'month': ['2017-01-23','2017-01-23','2017-01-23','2017-01-23','2017-02-26','2017-02-26','2017-02-26','2017-02-26']})
df = df[['sport_name','person_name','city','person_symbol','person_count','month']]
print df
sport_name person_name city person_symbol person_count month
0 football ramesh mumbai ram 10 2017-01-23
1 football ramesh mumbai mum 14 2017-01-23
2 football ramesh delhi mum 25 2017-01-23
3 football ramesh delhi ram 20 2017-01-23
4 football ramesh mumbai ram 34 2017-02-26
5 football ramesh mumbai mum 23 2017-02-26
6 football ramesh delhi mum 43 2017-02-26
7 football ramesh delhi ram 34 2017-02-26
预期输出:
derived_symbol person_count month
football.ramesh.TOTAL.mumbai_count 24 2017-01-23
football.ramesh.TOTAL.delhi_count 45 2017-01-23
football.ramesh.TOTAL.ram_count 30 2017-01-23
football.ramesh.TOTAL.mum_count 39 2017-01-23
football.ramesh.TOTAL.mumbai_count 57 2017-02-26
football.ramesh.TOTAL.delhi_count 77 2017-02-26
football.ramesh.TOTAL.ram_count 68 2017-02-26
football.ramesh.TOTAL.mum_count 66 2017-02-26
我按照以下方式计算了一个月的person_count:
I did following to calculate person_count month wise:
df = pd.DataFrame({'sport_name': ['football','football','football','football','football','football','football','football'],
'person_name': ['ramesh','ramesh','ramesh','ramesh','ramesh','ramesh','ramesh','ramesh'],
'city': ['mumbai', 'mumbai','delhi','delhi','mumbai', 'mumbai','delhi','delhi'],
'person_symbol': ['ram','mum','mum','ram','ram','mum','mum','ram'],
'person_count': ['10','14','25','20','34','23','43','34'],
'month': ['2017-01-23','2017-01-23','2017-01-23','2017-01-23','2017-02-26','2017-02-26','2017-02-26','2017-02-26']})
df = df[['sport_name','person_name','city','person_symbol','person_count','month']]
df['person_count'] = df['person_count'].astype(int)
df1=df.set_index(['sport_name','person_name','person_count','month']).stack().reset_index(name='val')
df1['derived_symbol'] = df1['sport_name'] + '.' + df1['person_name'] + '.TOTAL.' + df1['val'] + '_count'
df2 = df1.groupby(['derived_symbol','month'])['person_count'].sum().reset_index(name='person_count')
print (df2)
推荐答案
使用:
#convert column to int
df['person_count'] = df['person_count'].astype(int)
#reshape for one column from city and person_symbol columns
df1=df.set_index(['sport_name','person_name','person_count']).stack().reset_index(name='val')
print (df1)
sport_name person_name person_count level_3 val
0 football ramesh 10 city mumbai
1 football ramesh 10 person_symbol ram
2 football ramesh 14 city mumbai
3 football ramesh 14 person_symbol mum
4 football ramesh 25 city delhi
5 football ramesh 25 person_symbol mum
6 football ramesh 20 city delhi
7 football ramesh 20 person_symbol ram
8 football mohit 11 city pune
9 football mohit 11 person_symbol moh
10 cricket mahesh 34 city surat
11 cricket mahesh 34 person_symbol mah
12 cricket mahesh 23 city surat
13 cricket mahesh 23 person_symbol sur
14 cricket mahesh 43 city panji
15 cricket mahesh 43 person_symbol sur
16 cricket mahesh 34 city panji
17 cricket mahesh 34 person_symbol mah
#concatenate columns
a = df1['sport_name'] + '.' + df1['person_name'] + '.TOTAL.' + df1['val'] + '_count'
#groupby by Series a and aggregate sum
df2 = df1['person_count'].groupby(a.rename('derived_symbol'), sort=False)
.sum()
.reset_index(name='person_count')
print (df2)
derived_symbol person_count
0 football.ramesh.TOTAL.mumbai_count 24
1 football.ramesh.TOTAL.ram_count 30
2 football.ramesh.TOTAL.mum_count 39
3 football.ramesh.TOTAL.delhi_count 45
4 football.mohit.TOTAL.pune_count 11
5 football.mohit.TOTAL.moh_count 11
6 cricket.mahesh.TOTAL.surat_count 57
7 cricket.mahesh.TOTAL.mah_count 68
8 cricket.mahesh.TOTAL.sur_count 66
9 cricket.mahesh.TOTAL.panji_count 77
这篇关于如何在python pandas中从具有某些条件的一列中计算总和?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!