将大 pandas 中的groupby()分成较小的组并将其组合 [英] Splitting groupby() in pandas into smaller groups and combining them

查看:89
本文介绍了将大 pandas 中的groupby()分成较小的组并将其组合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

            city  temperature  windspeed   event
            day                                                 
            2017-01-01  new york           32          6    Rain
            2017-01-02  new york           36          7   Sunny
            2017-01-03  new york           28         12    Snow
            2017-01-04  new york           33          7   Sunny
            2017-01-05  new york           31          7    Rain
            2017-01-06  new york           33          5   Sunny
            2017-01-07  new york           27         12    Rain
            2017-01-08  new york           23          7  Rain
            2017-01-01    mumbai           90          5   Sunny
            2017-01-02    mumbai           85         12     Fog
            2017-01-03    mumbai           87         15     Fog
            2017-01-04    mumbai           92          5    Rain
            2017-01-05    mumbai           89          7   Sunny
            2017-01-06    mumbai           80         10     Fog
            2017-01-07    mumbai           85         9     Sunny
            2017-01-08    mumbai           89          8    Rain
            2017-01-01     paris           45         20   Sunny
            2017-01-02     paris           50         13  Cloudy
            2017-01-03     paris           54          8  Cloudy
            2017-01-04     paris           42         10  Cloudy
            2017-01-05     paris           43         20   Sunny
            2017-01-06     paris           48         4  Cloudy
            2017-01-07     paris           40          14  Rain
            2017-01-08     paris           42         15  Cloudy
            2017-01-09     paris           53         8  Sunny

上面显示的是原始数据.

The above shows the original data.

下面显示了使用np.array_split(data,4)的结果.

Below shows the result using np.array_split(data, 4).

            day city  temperature  windspeed  event                                                
            2017-01-01  new york           32          6    Rain
            2017-01-02  new york           36          7   Sunny
            2017-01-03  new york           28         12    Snow
            2017-01-04  new york           33          7   Sunny
            2017-01-05  new york           31          7    Rain
            2017-01-06  new york           33          5   Sunny
            2017-01-07  new york           27         12    Rain  

            day city  temperature  windspeed  event                                                    
            2017-01-08  new york           23          7  Rain
            2017-01-01    mumbai           90          5   Sunny
            2017-01-02    mumbai           85         12     Fog
            2017-01-03    mumbai           87         15     Fog
            2017-01-04    mumbai           92          5    Rain
            2017-01-05    mumbai           89          7   Sunny             
            day city  temperature  windspeed  event                                                  
            2017-01-06    mumbai           80         10     Fog
            2017-01-07    mumbai           85         9     Sunny
            2017-01-08    mumbai           89          8    Rain
            2017-01-01     paris           45         20   Sunny
            2017-01-02     paris           50         13  Cloudy
            2017-01-03     paris           54          8  Cloudy              
            day city  temperature  windspeed  event             
            2017-01-04     paris           42         10  Cloudy
            2017-01-05     paris           43         20   Sunny
            2017-01-06     paris           48         4  Cloudy
            2017-01-07     paris           40          14  Rain
            2017-01-08     paris           42         15  Cloudy
            2017-01-09     paris           53         8  Sunny

正如您在此处看到的那样,我正在尝试根据原始数据创建4个组,以确保每个组都包含所有城市.但是,通过使用array.split(),它将数据分为4组,但并不包含所有城市.我希望每个小组都有孟买,巴黎和纽约. 我该怎么办?

As you can see here, I'm trying to create 4 groups from the original data, making sure that each group has all the cities. however, by using array.split(), it split the data into 4 groups but it does not contain all the cities. I want each group to have Mumbai, Paris and New York. How can I do that?

要说的是,我要实现的目标如下:

Meaning to say, what I'm trying to achieve is something like below:

第1组:

            day city  temperature  windspeed  event                                                
            2017-01-01  new york           32          6   Rain
            2017-01-02  paris           50         13  Cloudy
            2017-01-02    mumbai           85         12    Fog, 
            2017-01-05  new york           31          7    Rain
            2017-01-06  new york           33          5   Sunny
            2017-01-05    mumbai           89          7   Sunny  
            2017-01-05     paris           43         20   Sunny

第2组:

            day city  temperature  windspeed  event                                                    
            2017-01-04  new york           33          7  Sunny
            2017-01-01    mumbai           90          5  Sunny
            2017-01-03  paris           54          8  Cloudy
            2017-01-07  new york           27         12    Rain 
            2017-01-06    mumbai           80         10     Fog
            2017-01-09     paris           53         8  Sunny

第3组:

            day city  temperature  windspeed  event         
            2017-01-02  new york           36          7  Sunny                                         
            2017-01-03  mumbai           87         15    Fog
            2017-01-01   paris           45         20  Sunny,   
            2017-01-08    mumbai           89          8    Rain
            2017-01-06     paris           48         4  Cloudy
            2017-01-07     paris           40          14  Rain

第4组:

            day city  temperature  windspeed  event             
            2017-01-03  new york           28         12   Snow,  
            2017-01-04  mumbai           92          5   Rain
            2017-01-07    mumbai           85         9     Sunny
            2017-01-04  paris           42         10  Cloudy
            2017-01-08     paris           42         15  Cloudy
            2017-01-08  new york           23          7  Rain

从预期结果可以看到,主要的是所有组都包含每个主题.

As you can see from the expected result, the main thing is that all the groups contain each topic.

我要记住的是按城市对数据进行分组,然后从每个城市的数据框中将数据分为4组,然后针对城市中的每个组,将数据组合起来得到4个最终组.

What I have in mind is to group the data by city, then from each city's dataframe, divide the data into 4 groups, then for each group in the city, combine the data to get 4 final group.

推荐答案

您可以通过GroupBy + cumcount创建一个帮助列,以计算每个城市的发生次数.

You can create a helper column via GroupBy + cumcount to count the occurrence of each city.

然后将dict + tuple与另一个GroupBy结合使用,以创建一个数据帧字典,每个数据帧仅包含每个城市的一次出现.

Then use dict + tuple with another GroupBy to create a dictionary of dataframes, each one containing exactly one occurence of each city.

# add index column giving count of city occurrence
df['index'] = df.groupby('city').cumcount()

# create dictionary of dataframes
d = dict(tuple(df.groupby('index')))

结果:

print(d)

{0:                city  temperature  windspeed  event  index
 day                                                      
 2017-01-01  newyork           32          6   Rain      0
 2017-01-01   mumbai           90          5  Sunny      0
 2017-01-01    paris           45         20  Sunny      0,
 1:                city  temperature  windspeed   event  index
 day                                                       
 2017-01-02  newyork           36          7   Sunny      1
 2017-01-02   mumbai           85         12     Fog      1
 2017-01-02    paris           50         13  Cloudy      1,
 2:                city  temperature  windspeed   event  index
 day                                                       
 2017-01-03  newyork           28         12    Snow      2
 2017-01-03   mumbai           87         15     Fog      2
 2017-01-03    paris           54          8  Cloudy      2,
 3:                city  temperature  windspeed   event  index
 day                                                       
 2017-01-04  newyork           33          7   Sunny      3
 2017-01-04   mumbai           92          5    Rain      3
 2017-01-04    paris           42         10  Cloudy      3}

然后可以通过d[0]d[1]d[2]d[3]提取单个组".在这种情况下,您可能希望按日期分组,即

You can then extract individual "groups" via d[0], d[1], d[2], d[3]. In this particular case, you may wish to group by dates instead, i.e.

d = {df_.index[0]: df_ for _, df_ in df.groupby('index')}

这篇关于将大 pandas 中的groupby()分成较小的组并将其组合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆