使 Pandas groupby 的行为类似于 itertools groupby [英] Make Pandas groupby act similarly to itertools groupby

查看:40
本文介绍了使 Pandas groupby 的行为类似于 itertools groupby的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个像这样的列表的 Python 字典:

{'Grp': ['2' , '6' , '6' , '5' , '5' , '6' , '6' , '7' , '7' , '6'],'Nums': ['6.20', '6.30', '6.80', '6.45', '6.55', '6.35', '6.37', '6.36', '6.78', '6.33']}

我可以使用 itertools.groupby:

from itertools import groupby对于 k, l 在 groupby(zip(di['Grp'], di['Nums']), key=lambda t: t[0]):打印 k, [t[1] for t in l]

打印:

2 ['6.20']6 ['6.30', '6.80'] # 一个字段,key=65 ['6.45', '6.55']6 ['6.35', '6.37'] # 秒7 ['6.36', '6.78']6 ['6.33'] # 第三个

注意 6 键被分成三个独立的组或字段.

现在假设我的字典具有等效的 Pandas DataFrame(相同的数据、相同的列表顺序和相同的键):

 组号0 2 6.201 6 6.302 6 6.803 5 6.454 5 6.555 6 6.356 6 6.377 7 6.368 7 6.789 6 6.33

如果我使用 Pandas 的 groupby 我不知道如何逐组迭代获取.相反,Pandas 按键值分组:

for e in df.groupby('Grp'):打印e

打印:

('2', Grp Nums0 2 6.20)('5', 组号3 5 6.454 5 6.55)('6', 组号1 6 6.302 6 6.80 # df['Grp'][1:2] 第一个字段5 6 6.35 # df['Grp'][5:6] 第二个字段6 6 6.379 6 6.33) # df['Grp'][9] 第三个字段('7', 组号7 7 6.368 7 6.78)

注意 6 组键是串在一起的;不是单独的组.

我的问题:有没有一种等效的方法可以使用 Pandas 的 groupby,例如,6 会以与 Python 的 groupby 相同的方式分为三个组?

我试过了:

<预><代码>>>>df.reset_index().groupby('Grp')['index'].apply(lambda x: np.array(x))玻璃钢2 [0]5 [3, 4]6 [1, 2, 5, 6, 9] # 我*可以*在这个上做第二个分组...7 [7, 8]名称:索引,数据类型:对象

但它仍然按整体 Grp 键分组,我需要在 nd.array 上进行第二次分组以将每个键的子组分开.

解决方案

好吧,不要厚脸皮,但为什么不直接使用 Python 的 groupby 在 DataFrame 上使用 iterrows?这就是它的用途:

<预><代码>>>>df组号0 2 6.201 6 6.302 6 6.803 5 6.454 5 6.555 6 6.356 6 6.377 7 6.368 7 6.789 6 6.33>>>从 itertools 导入 groupby>>>对于 k, l 在 groupby(df.iterrows(), key=lambda row: row[1]['Grp']):打印 k, [t[1]['Nums'] for t in l]

打印:

2 ['6.20']6 ['6.30', '6.80']5 ['6.45', '6.55']6 ['6.35', '6.37']7 ['6.36', '6.78']6 ['6.33']

尝试让 Panda 的 groupby 以您想要的方式运行,这可能要求您使用大量堆叠方法,以至于您将来重新阅读时将无法遵循它.

Suppose I have a Python dict of lists like so:

{'Grp': ['2'   , '6'   , '6'   , '5'   , '5'   , '6'   , '6'   , '7'   , '7'   , '6'], 
'Nums': ['6.20', '6.30', '6.80', '6.45', '6.55', '6.35', '6.37', '6.36', '6.78', '6.33']}

I can easily group the numbers and group key using itertools.groupby:

from itertools import groupby
for k, l in groupby(zip(di['Grp'], di['Nums']), key=lambda t: t[0]):
    print k, [t[1] for t in l]

Prints:

2 ['6.20']
6 ['6.30', '6.80']      # one field, key=6
5 ['6.45', '6.55']
6 ['6.35', '6.37']      # second
7 ['6.36', '6.78']
6 ['6.33']              # third

Note the 6 key is separated into three separate groups or fields .

Now suppose I have the equivalent Pandas DataFrame to my dict (same data, same list order and same keys):

  Grp  Nums
0   2  6.20
1   6  6.30
2   6  6.80
3   5  6.45
4   5  6.55
5   6  6.35
6   6  6.37
7   7  6.36
8   7  6.78
9   6  6.33

If I use Pandas' groupby I am not seeing how to get group by group iteration. Instead, Pandas groups by key value:

for e in df.groupby('Grp'):
    print e

Prints:

('2',   Grp  Nums
0   2  6.20)
('5',   Grp  Nums
3   5  6.45
4   5  6.55)
('6',   Grp  Nums
1   6  6.30            
2   6  6.80                # df['Grp'][1:2] first field
5   6  6.35                # df['Grp'][5:6] second field
6   6  6.37                 
9   6  6.33)               # df['Grp'][9] third field
('7',   Grp  Nums
7   7  6.36
8   7  6.78)

Note are the 6 group keys are bunched together; not separate groups.

My question: Is there an equivalent way to use Pandas' groupby so that 6, for example, would be in three groups in the same fashion as Python's groupby?

I tried this:

>>> df.reset_index().groupby('Grp')['index'].apply(lambda x: np.array(x))
Grp
2                [0]
5             [3, 4]
6    [1, 2, 5, 6, 9]         # I *could* do a second groupby on this...
7             [7, 8]
Name: index, dtype: object

But it is still grouped by overall Grp key and I would need to do a second groupby on the nd.array to split the sub groups of each key out.

解决方案

Well, not to be cheeky, but why not just use Python's groupby on the DataFrame by using iterrows? That is what it is there for:

>>> df
  Grp  Nums
0   2  6.20
1   6  6.30
2   6  6.80
3   5  6.45
4   5  6.55
5   6  6.35
6   6  6.37
7   7  6.36
8   7  6.78
9   6  6.33

>>> from itertools import groupby
>>> for k, l in groupby(df.iterrows(), key=lambda row: row[1]['Grp']):
        print k, [t[1]['Nums'] for t in l]

Prints:

2 ['6.20']
6 ['6.30', '6.80']
5 ['6.45', '6.55']
6 ['6.35', '6.37']
7 ['6.36', '6.78']
6 ['6.33']

To try and make Panda's groupby act in the way you want is probably asking for so many stacked methods that you won't be able to follow it when you reread in the future.

这篇关于使 Pandas groupby 的行为类似于 itertools groupby的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆