将数据框拆分为分组的块 [英] Split dataframe into grouped chunks

查看:154
本文介绍了将数据框拆分为分组的块的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将一个数据帧分成多个块.我创建了一个函数,该函数能够将数据帧拆分为相等大小的块,但是无法弄清楚如何按组拆分.

I would like to split a dataframe into chunks. I have created a function which is able to split a dataframe into equal size chunks however am unable to figure out how to split by groups.

每个数据框拆分必须包括分组变量的所有实例,我希望可以灵活地包含多少个组(因为它们相对较小).

Each split of dataframe must include all instances of a grouping variable, I'd like flexibility on how many groups could be included (as they are relatively small).

示例数据框:

A  1
A  2
B  3
C  1
D  9
D  10

目标拆分(至少包括两组):

Target splits (include at least two groups):

拆分1:

A  1
A  2
B  3

拆分2:

C  1
D  9
D  10

如果有帮助,我当前的功能如下:

If helpful, my current function looks like the following:

def split_frame(sequence, size=10000):
    return (sequence[position:position + size] for position in range(0, len(sequence), size))

帮助表示赞赏!

推荐答案

在Python 2和3中可以使用

Works in Python 2 and 3:

df = pd.DataFrame(data=['a', 'a', 'b', 'c', 'a', 'a', 'b', 'v', 'v', 'f'], columns=['A']) 

def iter_by_group(df, column, num_groups):
    groups = []
    for i, group in df.groupby(column):
        groups.append(group)
        if len(groups) == num_groups:
            yield pd.concat(groups)
            groups = []
    if groups:
        yield pd.concat(groups)

for group in iter_by_group(df, 'A', 2):
    print(group)

A
0  a
1  a
4  a
5  a
2  b
6  b

A
3  c
9  f

A
7  v
8  v

这篇关于将数据框拆分为分组的块的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆