pandas `DataFrameGroupBy`和`SeriesGroupBy` [英] Pandas `DataFrameGroupBy` and `SeriesGroupBy`

查看:152
本文介绍了 pandas `DataFrameGroupBy`和`SeriesGroupBy`的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我承认我不是Python大师,但仍然发现与Pandas处理 DataFrameGroupBy SeriesGroupBy 对象异常违反直觉。 (我有一个R背景。)



我有下面的数据框:

  import pandas as pd 
import numpy as np
df = pd.DataFrame({'id':range(1,9),
'code':['one', '一','二','三',
'二','三','一','二'],
'颜色':['black','white', '白','白',
'黑','黑','白','白'],
'无关'1: 'bar','bar',
'foo','bar','bar'],
'irrelevant2':['foo','foo','foo','bar', 'bar',
'foo','bar','bar'],
'irrelevant3':['foo','foo','foo','bar','bar',
'foo','bar','bar'],
'amount':np.random.randn(8)},columns = ['id','code','color','irrelevant1', 'irrelevant2','irrelevant3','amount'])

我希望能够获得 id code color 分组。下面的代码进行分组,但保留所有列。

  gb = df.groupby(['code','color'])
gb.head(5)
id代码颜色不相关1不相关2不相关3金额
代码颜色
一个黑色0 1一个黑色foo foo foo -0.644170
白色1 2一个白色foo foo foo 0.912372
6 7个白色酒吧酒吧0.530575
三个黑色5个6个三个黑色foo foo foo -0.123806
个白色3个4个三个白色酒吧酒吧-0.387080
个二个黑色4个五个酒吧酒吧 - 0.578107
白色2 3两个白色foo foo foo 0.768637
7 8两个白色酒吧酒吧-0.282577

问题:

1) gb 中,如何仅存储 id 列(甚至没有任何索引),并摆脱其余的?



2)一旦我有所需的 DataFrameGroupBy gb ,如何访问 codec 一个和颜色=白色}?我尝试了 gb.get_group('one','white') gb.get_group(['one','white']) code>但它们不起作用。



3)如何访问{color = white},即缺少 code index?



4)最后,手册并不是很有帮助,你知道任何有如何创建和访问这些分组对象?

解决方案

对于您的问题,甚至不需要执行 groupby (但您应该在散文文档

更好的解决方案是 MultiIndex

  In [36]:df = df.set_index(['code','color'])。sort_index()

[37 ]:df
出[37]:
id irrelevant1 irrelevant2 irrelevant3金额
代码颜色
一黑1 foo foo foo 0.103045
白2 foo foo foo 0.751824
白7酒吧酒吧-1.275114
三黑6 foo foo foo 0.311305
白色4酒吧酒吧-0.416722
两个黑色5酒吧酒吧1.534859
白色3 foo foo foo -1.068399
白色8酒吧酒吧-0.243893

[8行x 5栏]

照顾 1


$ b

2 :使用熟悉的切片语法:

 在[38]中:df.loc ['one','white'] 
Out [38]:
id不相关1不相关2不相关3金额
代码colou r
one one white 2 foo foo foo 0.751824
white 7 bar bar bar -1.275114

[2 rows x 5 columns]

3 :这是一个横截面,使用 .xs

 在[39]中:df.xs('white',level ='color')
Out [39 ]:
id不相关1不相关2不相关3金额
代码
一个2 foo foo foo 0.751824
一个7酒吧酒吧-1.275114
三个4酒吧酒吧-0.416722
two 3 foo foo foo -1.068399
two 8 bar bar -0.243893

[5 rows x 5 columns]

4 :例子遍布全球。查看pandas / groupby标签,此文档的部分正在现在工作,上面链接的散文文档。


I admit that I am not a Python guru, but still I find dealing with Pandas DataFrameGroupBy and SeriesGroupBy objects exceptionally counter-intuitive. ( I have an R background.)

I have the dataframe below:

import pandas as pd
import numpy as np
df = pd.DataFrame({'id' : range(1,9),
                   'code' : ['one', 'one', 'two', 'three',
                             'two', 'three', 'one', 'two'],
                   'colour': ['black', 'white','white','white',
                           'black', 'black', 'white', 'white'],
                   'irrelevant1': ['foo', 'foo', 'foo','bar','bar',
                                     'foo','bar','bar'],
                   'irrelevant2': ['foo', 'foo', 'foo','bar','bar',
                                     'foo','bar','bar'],
                   'irrelevant3': ['foo', 'foo', 'foo','bar','bar',
                                     'foo','bar','bar'],
                   'amount' : np.random.randn(8)},  columns= ['id','code','colour', 'irrelevant1', 'irrelevant2', 'irrelevant3', 'amount'])

I want to be able to get the id's grouped by code and colour. The code below does the grouping but keeps all columns.

gb = df.groupby(['code','colour'])
gb.head(5)
                id   code colour irrelevant1 irrelevant2 irrelevant3    amount
code  colour                                                                  
one   black  0   1    one  black         foo         foo         foo -0.644170
      white  1   2    one  white         foo         foo         foo  0.912372
             6   7    one  white         bar         bar         bar  0.530575
three black  5   6  three  black         foo         foo         foo -0.123806
      white  3   4  three  white         bar         bar         bar -0.387080
two   black  4   5    two  black         bar         bar         bar -0.578107
      white  2   3    two  white         foo         foo         foo  0.768637
             7   8    two  white         bar         bar         bar -0.282577

Questions:

1) In gb, how do I only store the id column (and not even any index) and get rid of the rest?

2) Once I have the desired DataFrameGroupBy gb, how do I access the ids of cases where {code = one and colour=white} ? I tried gb.get_group('one','white') and gb.get_group(['one','white']) but they do not work.

3) How do I access entries where {colour=white}, i.e. lacking the code index ?

4) Finally, the manual is not very helpful, do you know of any sources where there are examples of how to create and access these grouped objects?

解决方案

For your problem, you don't even need to perform a groupby (but you should read more about it in the prose docs.

A better solution would be a MultiIndex:

In [36]: df = df.set_index(['code', 'colour']).sort_index()

In [37]: df
Out[37]: 
              id irrelevant1 irrelevant2 irrelevant3    amount
code  colour                                                  
one   black    1         foo         foo         foo  0.103045
      white    2         foo         foo         foo  0.751824
      white    7         bar         bar         bar -1.275114
three black    6         foo         foo         foo  0.311305
      white    4         bar         bar         bar -0.416722
two   black    5         bar         bar         bar  1.534859
      white    3         foo         foo         foo -1.068399
      white    8         bar         bar         bar -0.243893

[8 rows x 5 columns]

That takes care of 1.

2: Use the familiar slicing syntax:

In [38]: df.loc['one', 'white']
Out[38]: 
             id irrelevant1 irrelevant2 irrelevant3    amount
code colour                                                  
one  white    2         foo         foo         foo  0.751824
     white    7         bar         bar         bar -1.275114

[2 rows x 5 columns]

3: This is a cross-section, use .xs:

In [39]: df.xs('white', level='colour')
Out[39]: 
       id irrelevant1 irrelevant2 irrelevant3    amount
code                                                   
one     2         foo         foo         foo  0.751824
one     7         bar         bar         bar -1.275114
three   4         bar         bar         bar -0.416722
two     3         foo         foo         foo -1.068399
two     8         bar         bar         bar -0.243893

[5 rows x 5 columns]

4: Examples are all over the place. Check the pandas / groupby tag here, this section of the docs is being worked on right now, the prose docs linked above.

这篇关于 pandas `DataFrameGroupBy`和`SeriesGroupBy`的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆