pandas `DataFrameGroupBy`和`SeriesGroupBy` [英] Pandas `DataFrameGroupBy` and `SeriesGroupBy`
问题描述
我承认我不是Python大师,但仍然发现与Pandas处理 DataFrameGroupBy
和 SeriesGroupBy
对象异常违反直觉。 (我有一个R背景。)
我有下面的数据框:
import pandas as pd
import numpy as np
df = pd.DataFrame({'id':range(1,9),
'code':['one', '一','二','三',
'二','三','一','二'],
'颜色':['black','white', '白','白',
'黑','黑','白','白'],
'无关'1: 'bar','bar',
'foo','bar','bar'],
'irrelevant2':['foo','foo','foo','bar', 'bar',
'foo','bar','bar'],
'irrelevant3':['foo','foo','foo','bar','bar',
'foo','bar','bar'],
'amount':np.random.randn(8)},columns = ['id','code','color','irrelevant1', 'irrelevant2','irrelevant3','amount'])
我希望能够获得 id
由 code
和 color
分组。下面的代码进行分组,但保留所有列。
gb = df.groupby(['code','color'])
gb.head(5)
id代码颜色不相关1不相关2不相关3金额
代码颜色
一个黑色0 1一个黑色foo foo foo -0.644170
白色1 2一个白色foo foo foo 0.912372
6 7个白色酒吧酒吧0.530575
三个黑色5个6个三个黑色foo foo foo -0.123806
个白色3个4个三个白色酒吧酒吧-0.387080
个二个黑色4个五个酒吧酒吧 - 0.578107
白色2 3两个白色foo foo foo 0.768637
7 8两个白色酒吧酒吧-0.282577
问题:
1)在 gb
中,如何仅存储 id
列(甚至没有任何索引),并摆脱其余的?
2)一旦我有所需的 DataFrameGroupBy
gb
,如何访问 codec
一个和颜色=白色}?我尝试了 gb.get_group('one','white')
和 gb.get_group(['one','white']) code>但它们不起作用。
3)如何访问{color = white},即缺少 code
index?
4)最后,手册并不是很有帮助,你知道任何有如何创建和访问这些分组对象?
对于您的问题,甚至不需要执行 更好的解决方案是 照顾 1 。 2 :使用熟悉的切片语法: 3 :这是一个横截面,使用 4 :例子遍布全球。查看pandas / groupby标签,此文档的部分正在现在工作,上面链接的散文文档。 I admit that I am not a Python guru, but still I find dealing with Pandas I have the dataframe below: I want to be able to get the Questions: 1) In 2) Once I have the desired 3) How do I access entries where {colour=white}, i.e. lacking the 4) Finally, the manual is not very helpful, do you know of any sources where there are examples of how to create and access these grouped objects? For your problem, you don't even need to perform a A better solution would be a That takes care of 1. 2: Use the familiar slicing syntax: 3: This is a cross-section, use 4: Examples are all over the place. Check the pandas / groupby tag here, this section of the docs is being worked on right now, the prose docs linked above. 这篇关于 pandas `DataFrameGroupBy`和`SeriesGroupBy`的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋! groupby
(但您应该在散文文档
MultiIndex
:
In [36]:df = df.set_index(['code','color'])。sort_index()
[37 ]:df
出[37]:
id irrelevant1 irrelevant2 irrelevant3金额
代码颜色
一黑1 foo foo foo 0.103045
白2 foo foo foo 0.751824
白7酒吧酒吧-1.275114
三黑6 foo foo foo 0.311305
白色4酒吧酒吧-0.416722
两个黑色5酒吧酒吧1.534859
白色3 foo foo foo -1.068399
白色8酒吧酒吧-0.243893
[8行x 5栏]
$ b
在[38]中:df.loc ['one','white']
Out [38]:
id不相关1不相关2不相关3金额
代码colou r
one one white 2 foo foo foo 0.751824
white 7 bar bar bar -1.275114
[2 rows x 5 columns]
.xs
:
在[39]中:df.xs('white',level ='color')
Out [39 ]:
id不相关1不相关2不相关3金额
代码
一个2 foo foo foo 0.751824
一个7酒吧酒吧-1.275114
三个4酒吧酒吧-0.416722
two 3 foo foo foo -1.068399
two 8 bar bar -0.243893
[5 rows x 5 columns]
DataFrameGroupBy
and SeriesGroupBy
objects exceptionally counter-intuitive. ( I have an R background.)import pandas as pd
import numpy as np
df = pd.DataFrame({'id' : range(1,9),
'code' : ['one', 'one', 'two', 'three',
'two', 'three', 'one', 'two'],
'colour': ['black', 'white','white','white',
'black', 'black', 'white', 'white'],
'irrelevant1': ['foo', 'foo', 'foo','bar','bar',
'foo','bar','bar'],
'irrelevant2': ['foo', 'foo', 'foo','bar','bar',
'foo','bar','bar'],
'irrelevant3': ['foo', 'foo', 'foo','bar','bar',
'foo','bar','bar'],
'amount' : np.random.randn(8)}, columns= ['id','code','colour', 'irrelevant1', 'irrelevant2', 'irrelevant3', 'amount'])
id
's grouped by code
and colour
. The code below does the grouping but keeps all columns. gb = df.groupby(['code','colour'])
gb.head(5)
id code colour irrelevant1 irrelevant2 irrelevant3 amount
code colour
one black 0 1 one black foo foo foo -0.644170
white 1 2 one white foo foo foo 0.912372
6 7 one white bar bar bar 0.530575
three black 5 6 three black foo foo foo -0.123806
white 3 4 three white bar bar bar -0.387080
two black 4 5 two black bar bar bar -0.578107
white 2 3 two white foo foo foo 0.768637
7 8 two white bar bar bar -0.282577
gb
, how do I only store the id
column (and not even any index) and get rid of the rest?DataFrameGroupBy
gb
, how do I access the id
s of cases where {code = one and colour=white} ? I tried gb.get_group('one','white')
and gb.get_group(['one','white'])
but they do not work.code
index ?groupby
(but you should read more about it in the prose docs.MultiIndex
:In [36]: df = df.set_index(['code', 'colour']).sort_index()
In [37]: df
Out[37]:
id irrelevant1 irrelevant2 irrelevant3 amount
code colour
one black 1 foo foo foo 0.103045
white 2 foo foo foo 0.751824
white 7 bar bar bar -1.275114
three black 6 foo foo foo 0.311305
white 4 bar bar bar -0.416722
two black 5 bar bar bar 1.534859
white 3 foo foo foo -1.068399
white 8 bar bar bar -0.243893
[8 rows x 5 columns]
In [38]: df.loc['one', 'white']
Out[38]:
id irrelevant1 irrelevant2 irrelevant3 amount
code colour
one white 2 foo foo foo 0.751824
white 7 bar bar bar -1.275114
[2 rows x 5 columns]
.xs
:In [39]: df.xs('white', level='colour')
Out[39]:
id irrelevant1 irrelevant2 irrelevant3 amount
code
one 2 foo foo foo 0.751824
one 7 bar bar bar -1.275114
three 4 bar bar bar -0.416722
two 3 foo foo foo -1.068399
two 8 bar bar bar -0.243893
[5 rows x 5 columns]