pandas 数据帧GroupBy - 显示组统计 [英] Pandas Dataframe GroupBy - Displaying Group Statistics

查看：186 发布时间：2017/3/26 3:46:45 python pandas group-by dataframe

本文介绍了 pandas 数据帧GroupBy - 显示组统计的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

对于熊猫数据框：

 导入熊猫为pd 
 codes = [one，two ，三]; 
 colors = [black，white]; 
 textures = [soft，hard]; 
 N = 100＃数据帧长度
 df = pd.DataFrame（{'id'：range（1，N + 1），
'code'：[random.choice ）for i in range（1，N + 1）]，
'color'：[random.choice（colors）for i in range（1，N + 1）]，
'texture'对于范围（1，N + 1）中的i的random.choice（纹理）]，
'size'：范围（1，N + 1）中的i的[random.randint（1,100）] 
}，columns = ['id'，'code'，'color'，'texture'，'size']）

我运行下面的行以获得按代码分组的汇总大小和颜色对：

  grouping = df.groupby（['code'，'color']）agg（{'size'：np.sum }）。reset_index（）
>>分组
>>代码颜色大小
>> 0一黑987 
>> 1个白色972 
>> 2三黑972 
>> 3三白488 
>> 4两个黑色1162 
>> 5两个白色1158 
>> [6行×3列]

除了aggreageted（np.sum）大小外，我想获取单独的列：

i。每组的平均值（np.avg）

ii。具有给定组的最大大小的行的ID，

iii。组发生多少次（例如代码= 1，颜色=黑色，12次）

问题：什么是最快的方式做这个？我必须使用 apply（）和专有功能？

解决方案

您可以传递要应用于组的功能列表，例如：

  grouping = df.groupby（['code '，'color']）['size']。agg（[np.sum，np.average，np.size，np.argmax]）。reset_index（）
  pre> 
 
 由于 argmax 是最大行的索引，因此您需要在原始数据框上查找：
  grouping ['max_row_id'] = df.ix [grouping ['argmax']]。reset_index（grouping.index）注意：我选择了'size'列，因为所有的函数都适用于该列。如果您想对不同的列执行不同的功能集，可以使用 agg 与一个具有函数列表的字典。  agg（{'size'：[np.sum，np.average]}）。这导致 MultiIndex 列，这意味着在获取每个组中最大大小的ID时，您需要执行以下操作：
  grouping ['max_row_id'] = df.ix [grouping ['size'] ['argmax']]。reset_index（grouping.index）.id 
  
 
For the Pandas dataframe:
import pandas as pd
codes = ["one","two","three"];
colours = ["black", "white"];
textures = ["soft", "hard"];
N= 100 # length of the dataframe
df = pd.DataFrame({ 'id' : range(1,N+1),
                    'code' : [random.choice(codes) for i in range(1,N+1)],
                    'colour': [random.choice(colours) for i in range(1,N+1)],
                    'texture': [random.choice(textures) for i in range(1,N+1)],
                    'size': [random.randint(1,100) for i in range(1,N+1)]
                    },  columns= ['id','code','colour', 'texture', 'size'])
I run the line below to get the aggregated sizes grouped by code and colour pairs:
grouped = df.groupby(['code', 'colour']).agg({'size' : np.sum}).reset_index()
>> grouped
>>     code colour  size
>> 0    one  black   987
>> 1    one  white   972
>> 2  three  black   972
>> 3  three  white   488
>> 4    two  black  1162
>> 5    two  white  1158
>> [6 rows x 3 columns]
In additon to the aggreageted (np.sum) sizes, I want to get separate columns for:


i. average value (np.avg) per group

ii. the id of the row with the max size for a given group, 

iii. how many times the group occured (e.g. code=one, colour=black, 12 times)

Question: What is the fastest way to do this? Would I have to use apply() and a proprietary function?
 解决方案 
You can pass a list of functions to be applied to the group, e.g.:
grouped = df.groupby(['code', 'colour'])['size'].agg([np.sum, np.average, np.size, np.argmax]).reset_index()
Since argmax is the index of the maximum row, you will need to look them up on the original dataframe:
grouped['max_row_id'] = df.ix[grouped['argmax']].reset_index(grouped.index).id
NOTE: I selected the 'size' column because all the functions apply to that column. If you wanted to do a different set of functions for different columns, you can use agg with a dictionary with a list of functions e.g. agg({'size': [np.sum, np.average]}). This results in MultiIndex columns, which means that when getting the IDs for the maximum size in each group you need to do:
grouped['max_row_id'] = df.ix[grouped['size']['argmax']].reset_index(grouped.index).id


                        
这篇关于 pandas 数据帧GroupBy  - 显示组统计的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

pandas 数据帧GroupBy - 显示组统计 [英] Pandas Dataframe GroupBy - Displaying Group Statistics

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

pandas 数据帧GroupBy - 显示组统计 [英] Pandas Dataframe GroupBy - Displaying Group Statistics

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭