pandas 频率表（如R中的plyr） [英] Frequency tables in pandas (like plyr in R)

查看：139 发布时间：2018/5/30 13:43:52 python group-by pandas

本文介绍了 pandas 频率表（如R中的plyr）的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的问题是如何计算熊猫多个变量的频率。
我从这个数据框中获得：

pre $ d1 = pd.DataFrame（{'StudentID'：[x1， x10，x2，x3，x4，x5，x6，x7，x8，x9]， 'StudentGender'：['F'， 'M'，'F'，'M'，'F'，'M'，'F'，'M'，'M'，'M']， 'ExamenYear'：['2007'， '2007'，'2007'，'2008'，'2008'，'2008'，'2008'，'2009'，'2009'，'2009']， '考试'：['algebra'， '统计'，'生物'，'代数'，'代数'，'统计'，'统计'，'代数'，'生物'，'生物']， '参与'：['no'， '是'，'是'，'是'，'否'，'是'，'是'，'是'，'是'，'是']， '通过'：['no'， '是'，'是'，'是'，'否'，'是'，'是'，'是'，'否'，'是']}，列= ['StudentID'，' StudentGender'，'ExamenYear'， 'Exam'，'Participated'，'Passed']）

以下结果

 参与分配
 ExamenYear 
 2007 3 2 
 2008 4 3 
 2009 3 2

（1）我试过的一种可能性是计算两个数据框并将它们绑定在一起

  t1 = d1.pivot_table（values ='StudentID'，rows = ['ExamenYear']，cols = ['Participated']，aggfunc = len）
 t2 = d1.pivot_table（values ='StudentID'，rows = ['ExamenYear']，cols = ['Passed']，aggfunc = len）
 tx = pd.concat（[t1，t2]， axis = 1）
 
 Res1 = tx ['yes']

2）第二种可能性是使用聚合函数。

 汇入汇款
 dg = d1.groupby（'ExamenYear'）
 Res2 = dg.agg（ {'Participated'：len，'Passed'：lambda x：collections.Counter（x =='yes'）[True]}）
 
 Res2.columns = ['Participated'，'OfWhichpassed' ]

至少可以这样说。
在熊猫中这样做是否正确？

PS：我也试过 value_counts em> collections.Counter 但无法使用

仅供参考：几个月前，我对R 这里和 plyr 可以帮助您

--- - UPDATE ------

user DSM 是正确的。在所需的表格结果中出现错误。

（1）选项一的代码是：

$ $ p $ t1 = d1.pivot_table（values ='StudentID'，rows = ['ExamenYear']，aggfunc = len） t2 = d1.pivot_table（values ='StudentID'，rows = ['ExamenYear']，cols = ['参与']，aggfunc = len） t3 = d1.pivot_table（values ='StudentID'，rows = ['ExamenYear']，cols = ['Passed']，aggfunc = len） Res1 = pd.DataFrame（{'All'：t1，$ b $'OfWhichParticipated'：t2 ['yes']， 'OfWhichPassed'：t3 ['yes']}）

它会产生结果

  All OfWhichParticipated OfWhichPassed 
 ExamenYear 
 2007 3 2 2 
 2008 4 3 3 
 2009 3 3 2 
   herrfz ，我想出了如何使用value_count，代码将会是 
 
 
  Res2 = d1.groupby（'ExamenYear'）。agg（{'StudentID'：len，
'Participated'：lambda x：x.value_counts（）['yes']，
'Passed' ：lambda x：x.value_counts（）['yes']}）
 
 Res2.columns = ['All'，'OfWgichParticipated'，'OfWhichPassed'] 
  
这将产生与Res1相同的结果
 
 
 我的问题仍然存在： p> 
 
 
使用选项2，是否可以两次使用相同的变量（用于其他操作？）是否可以为结果变量传递一个自定义名称？ 
 
 
   ----新更新----  
 
 
 我有最后决定使用应用，我了解的更灵活。 
解决方案
   
 
 
  d1.groupby（'ExamenYear'）。agg（{'Participated'：len，
'Passed'：lambda x：sum（x =='yes'）}）
  
看起来不像R解决方案那么尴尬，恕我直言。
 
My problem is how to calculate frequencies on multiple variables in pandas . 
I have from this dataframe : 
d1 = pd.DataFrame( {'StudentID': ["x1", "x10", "x2","x3", "x4", "x5", "x6",   "x7",     "x8", "x9"],
                       'StudentGender' : ['F', 'M', 'F', 'M', 'F', 'M', 'F', 'M', 'M', 'M'],
                 'ExamenYear': ['2007','2007','2007','2008','2008','2008','2008','2009','2009','2009'],
                 'Exam': ['algebra', 'stats', 'bio', 'algebra', 'algebra', 'stats', 'stats', 'algebra', 'bio', 'bio'],
                 'Participated': ['no','yes','yes','yes','no','yes','yes','yes','yes','yes'],
                  'Passed': ['no','yes','yes','yes','no','yes','yes','yes','no','yes']},
                  columns = ['StudentID', 'StudentGender', 'ExamenYear', 'Exam', 'Participated', 'Passed'])
To the following result
             Participated  OfWhichpassed
 ExamenYear                             
2007                   3              2
2008                   4              3
2009                   3              2
(1) One possibility I tried is to compute two dataframe and bind them 
t1 = d1.pivot_table(values = 'StudentID', rows=['ExamenYear'], cols = ['Participated'], aggfunc = len)
t2 = d1.pivot_table(values = 'StudentID', rows=['ExamenYear'], cols = ['Passed'], aggfunc = len)
tx = pd.concat([t1, t2] , axis = 1)

Res1 = tx['yes']
(2) The second possibility is to use an aggregation function . 
import collections
dg = d1.groupby('ExamenYear')
Res2 = dg.agg({'Participated': len,'Passed': lambda x : collections.Counter(x == 'yes')[True]})

 Res2.columns = ['Participated', 'OfWhichpassed']
Both ways are awckward to say the least. 
How is this done properly in pandas ? 

P.S: I also tried value_counts instead of collections.Counter but could not get it to work 

For reference: Few months ago, I asked similar question for R  here and plyr could help 

---- UPDATE ------

user DSM is right. there was a mistake in the desired table result. 

(1) The code for option one is 
 t1 = d1.pivot_table(values = 'StudentID', rows=['ExamenYear'], aggfunc = len)
 t2 = d1.pivot_table(values = 'StudentID', rows=['ExamenYear'], cols = ['Participated'], aggfunc = len)
 t3 = d1.pivot_table(values = 'StudentID', rows=['ExamenYear'], cols = ['Passed'], aggfunc = len)

 Res1 = pd.DataFrame( {'All': t1,
                       'OfWhichParticipated': t2['yes'],
                     'OfWhichPassed': t3['yes']})
It will produce the result 
             All  OfWhichParticipated  OfWhichPassed
ExamenYear                                         
2007          3                    2              2
2008          4                    3              3
2009          3                    3              2
(2) For Option 2, thanks to user herrfz, I figured out how to use value_count and the code will be 
Res2 = d1.groupby('ExamenYear').agg({'StudentID': len,
                                 'Participated': lambda x: x.value_counts()['yes'],
                                 'Passed': lambda x: x.value_counts()['yes']})

Res2.columns = ['All', 'OfWgichParticipated', 'OfWhichPassed']
which will produce the same result as Res1

My question remains though: 

Using Option 2, will it be possible to use the same Variable twice (for another operation ?) can one pass a custom name for the resulting variable ? 

---- A NEW UPDATE ---- 

I have finally decided to use apply which I understand is more flexible. 
 解决方案 
This:
d1.groupby('ExamenYear').agg({'Participated': len, 
                              'Passed': lambda x: sum(x == 'yes')})
doesn't look way more awkward than the R solution, IMHO.

                        这篇关于 pandas 频率表（如R中的plyr）的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

pandas 频率表（如R中的plyr） [英] Frequency tables in pandas (like plyr in R)

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

pandas 频率表（如R中的plyr） [英] Frequency tables in pandas (like plyr in R)

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭