使 Pandas groupby 的行为类似于 itertools groupby [英] Make Pandas groupby act similarly to itertools groupby
问题描述
假设我有一个像这样的列表的 Python 字典:
{'Grp': ['2' , '6' , '6' , '5' , '5' , '6' , '6' , '7' , '7' , '6'],'Nums': ['6.20', '6.30', '6.80', '6.45', '6.55', '6.35', '6.37', '6.36', '6.78', '6.33']}
我可以使用 itertools.groupby:
from itertools import groupby对于 k, l 在 groupby(zip(di['Grp'], di['Nums']), key=lambda t: t[0]):打印 k, [t[1] for t in l]
打印:
2 ['6.20']6 ['6.30', '6.80'] # 一个字段,key=65 ['6.45', '6.55']6 ['6.35', '6.37'] # 秒7 ['6.36', '6.78']6 ['6.33'] # 第三个
注意 6
键被分成三个独立的组或字段.
现在假设我的字典具有等效的 Pandas DataFrame(相同的数据、相同的列表顺序和相同的键):
组号0 2 6.201 6 6.302 6 6.803 5 6.454 5 6.555 6 6.356 6 6.377 7 6.368 7 6.789 6 6.33
如果我使用 Pandas 的 groupby 我不知道如何逐组迭代获取.相反,Pandas 按键值分组:
for e in df.groupby('Grp'):打印e
打印:
('2', Grp Nums0 2 6.20)('5', 组号3 5 6.454 5 6.55)('6', 组号1 6 6.302 6 6.80 # df['Grp'][1:2] 第一个字段5 6 6.35 # df['Grp'][5:6] 第二个字段6 6 6.379 6 6.33) # df['Grp'][9] 第三个字段('7', 组号7 7 6.368 7 6.78)
注意 6
组键是串在一起的;不是单独的组.
我的问题:有没有一种等效的方法可以使用 Pandas 的 groupby,例如,6
会以与 Python 的 groupby
相同的方式分为三个组?
我试过了:
<预><代码>>>>df.reset_index().groupby('Grp')['index'].apply(lambda x: np.array(x))玻璃钢2 [0]5 [3, 4]6 [1, 2, 5, 6, 9] # 我*可以*在这个上做第二个分组...7 [7, 8]名称:索引,数据类型:对象但它仍然按整体 Grp
键分组,我需要在 nd.array
上进行第二次分组以将每个键的子组分开.
好吧,不要厚脸皮,但为什么不直接使用 Python 的 groupby
在 DataFrame 上使用 iterrows?这就是它的用途:
打印:
2 ['6.20']6 ['6.30', '6.80']5 ['6.45', '6.55']6 ['6.35', '6.37']7 ['6.36', '6.78']6 ['6.33']
尝试让 Panda 的 groupby
以您想要的方式运行,这可能要求您使用大量堆叠方法,以至于您将来重新阅读时将无法遵循它.
Suppose I have a Python dict of lists like so:
{'Grp': ['2' , '6' , '6' , '5' , '5' , '6' , '6' , '7' , '7' , '6'],
'Nums': ['6.20', '6.30', '6.80', '6.45', '6.55', '6.35', '6.37', '6.36', '6.78', '6.33']}
I can easily group the numbers and group key using itertools.groupby:
from itertools import groupby
for k, l in groupby(zip(di['Grp'], di['Nums']), key=lambda t: t[0]):
print k, [t[1] for t in l]
Prints:
2 ['6.20']
6 ['6.30', '6.80'] # one field, key=6
5 ['6.45', '6.55']
6 ['6.35', '6.37'] # second
7 ['6.36', '6.78']
6 ['6.33'] # third
Note the 6
key is separated into three separate groups or fields .
Now suppose I have the equivalent Pandas DataFrame to my dict (same data, same list order and same keys):
Grp Nums
0 2 6.20
1 6 6.30
2 6 6.80
3 5 6.45
4 5 6.55
5 6 6.35
6 6 6.37
7 7 6.36
8 7 6.78
9 6 6.33
If I use Pandas' groupby I am not seeing how to get group by group iteration. Instead, Pandas groups by key value:
for e in df.groupby('Grp'):
print e
Prints:
('2', Grp Nums
0 2 6.20)
('5', Grp Nums
3 5 6.45
4 5 6.55)
('6', Grp Nums
1 6 6.30
2 6 6.80 # df['Grp'][1:2] first field
5 6 6.35 # df['Grp'][5:6] second field
6 6 6.37
9 6 6.33) # df['Grp'][9] third field
('7', Grp Nums
7 7 6.36
8 7 6.78)
Note are the 6
group keys are bunched together; not separate groups.
My question: Is there an equivalent way to use Pandas' groupby so that 6
, for example, would be in three groups in the same fashion as Python's groupby
?
I tried this:
>>> df.reset_index().groupby('Grp')['index'].apply(lambda x: np.array(x))
Grp
2 [0]
5 [3, 4]
6 [1, 2, 5, 6, 9] # I *could* do a second groupby on this...
7 [7, 8]
Name: index, dtype: object
But it is still grouped by overall Grp
key and I would need to do a second groupby on the nd.array
to split the sub groups of each key out.
Well, not to be cheeky, but why not just use Python's groupby
on the DataFrame by using iterrows? That is what it is there for:
>>> df
Grp Nums
0 2 6.20
1 6 6.30
2 6 6.80
3 5 6.45
4 5 6.55
5 6 6.35
6 6 6.37
7 7 6.36
8 7 6.78
9 6 6.33
>>> from itertools import groupby
>>> for k, l in groupby(df.iterrows(), key=lambda row: row[1]['Grp']):
print k, [t[1]['Nums'] for t in l]
Prints:
2 ['6.20']
6 ['6.30', '6.80']
5 ['6.45', '6.55']
6 ['6.35', '6.37']
7 ['6.36', '6.78']
6 ['6.33']
To try and make Panda's groupby
act in the way you want is probably asking for so many stacked methods that you won't be able to follow it when you reread in the future.
这篇关于使 Pandas groupby 的行为类似于 itertools groupby的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!