将大 pandas 中的groupby()分成较小的组并将其组合 [英] Splitting groupby() in pandas into smaller groups and combining them
问题描述
city temperature windspeed event
day
2017-01-01 new york 32 6 Rain
2017-01-02 new york 36 7 Sunny
2017-01-03 new york 28 12 Snow
2017-01-04 new york 33 7 Sunny
2017-01-05 new york 31 7 Rain
2017-01-06 new york 33 5 Sunny
2017-01-07 new york 27 12 Rain
2017-01-08 new york 23 7 Rain
2017-01-01 mumbai 90 5 Sunny
2017-01-02 mumbai 85 12 Fog
2017-01-03 mumbai 87 15 Fog
2017-01-04 mumbai 92 5 Rain
2017-01-05 mumbai 89 7 Sunny
2017-01-06 mumbai 80 10 Fog
2017-01-07 mumbai 85 9 Sunny
2017-01-08 mumbai 89 8 Rain
2017-01-01 paris 45 20 Sunny
2017-01-02 paris 50 13 Cloudy
2017-01-03 paris 54 8 Cloudy
2017-01-04 paris 42 10 Cloudy
2017-01-05 paris 43 20 Sunny
2017-01-06 paris 48 4 Cloudy
2017-01-07 paris 40 14 Rain
2017-01-08 paris 42 15 Cloudy
2017-01-09 paris 53 8 Sunny
上面显示的是原始数据.
The above shows the original data.
下面显示了使用np.array_split(data,4)的结果.
Below shows the result using np.array_split(data, 4).
day city temperature windspeed event
2017-01-01 new york 32 6 Rain
2017-01-02 new york 36 7 Sunny
2017-01-03 new york 28 12 Snow
2017-01-04 new york 33 7 Sunny
2017-01-05 new york 31 7 Rain
2017-01-06 new york 33 5 Sunny
2017-01-07 new york 27 12 Rain
day city temperature windspeed event
2017-01-08 new york 23 7 Rain
2017-01-01 mumbai 90 5 Sunny
2017-01-02 mumbai 85 12 Fog
2017-01-03 mumbai 87 15 Fog
2017-01-04 mumbai 92 5 Rain
2017-01-05 mumbai 89 7 Sunny
day city temperature windspeed event
2017-01-06 mumbai 80 10 Fog
2017-01-07 mumbai 85 9 Sunny
2017-01-08 mumbai 89 8 Rain
2017-01-01 paris 45 20 Sunny
2017-01-02 paris 50 13 Cloudy
2017-01-03 paris 54 8 Cloudy
day city temperature windspeed event
2017-01-04 paris 42 10 Cloudy
2017-01-05 paris 43 20 Sunny
2017-01-06 paris 48 4 Cloudy
2017-01-07 paris 40 14 Rain
2017-01-08 paris 42 15 Cloudy
2017-01-09 paris 53 8 Sunny
正如您在此处看到的那样,我正在尝试根据原始数据创建4个组,以确保每个组都包含所有城市.但是,通过使用array.split(),它将数据分为4组,但并不包含所有城市.我希望每个小组都有孟买,巴黎和纽约. 我该怎么办?
As you can see here, I'm trying to create 4 groups from the original data, making sure that each group has all the cities. however, by using array.split(), it split the data into 4 groups but it does not contain all the cities. I want each group to have Mumbai, Paris and New York. How can I do that?
要说的是,我要实现的目标如下:
Meaning to say, what I'm trying to achieve is something like below:
第1组:
day city temperature windspeed event
2017-01-01 new york 32 6 Rain
2017-01-02 paris 50 13 Cloudy
2017-01-02 mumbai 85 12 Fog,
2017-01-05 new york 31 7 Rain
2017-01-06 new york 33 5 Sunny
2017-01-05 mumbai 89 7 Sunny
2017-01-05 paris 43 20 Sunny
第2组:
day city temperature windspeed event
2017-01-04 new york 33 7 Sunny
2017-01-01 mumbai 90 5 Sunny
2017-01-03 paris 54 8 Cloudy
2017-01-07 new york 27 12 Rain
2017-01-06 mumbai 80 10 Fog
2017-01-09 paris 53 8 Sunny
第3组:
day city temperature windspeed event
2017-01-02 new york 36 7 Sunny
2017-01-03 mumbai 87 15 Fog
2017-01-01 paris 45 20 Sunny,
2017-01-08 mumbai 89 8 Rain
2017-01-06 paris 48 4 Cloudy
2017-01-07 paris 40 14 Rain
第4组:
day city temperature windspeed event
2017-01-03 new york 28 12 Snow,
2017-01-04 mumbai 92 5 Rain
2017-01-07 mumbai 85 9 Sunny
2017-01-04 paris 42 10 Cloudy
2017-01-08 paris 42 15 Cloudy
2017-01-08 new york 23 7 Rain
从预期结果可以看到,主要的是所有组都包含每个主题.
As you can see from the expected result, the main thing is that all the groups contain each topic.
我要记住的是按城市对数据进行分组,然后从每个城市的数据框中将数据分为4组,然后针对城市中的每个组,将数据组合起来得到4个最终组.
What I have in mind is to group the data by city, then from each city's dataframe, divide the data into 4 groups, then for each group in the city, combine the data to get 4 final group.
推荐答案
您可以通过GroupBy
+ cumcount
创建一个帮助列,以计算每个城市的发生次数.
You can create a helper column via GroupBy
+ cumcount
to count the occurrence of each city.
然后将dict
+ tuple
与另一个GroupBy
结合使用,以创建一个数据帧字典,每个数据帧仅包含每个城市的一次出现.
Then use dict
+ tuple
with another GroupBy
to create a dictionary of dataframes, each one containing exactly one occurence of each city.
# add index column giving count of city occurrence
df['index'] = df.groupby('city').cumcount()
# create dictionary of dataframes
d = dict(tuple(df.groupby('index')))
结果:
print(d)
{0: city temperature windspeed event index
day
2017-01-01 newyork 32 6 Rain 0
2017-01-01 mumbai 90 5 Sunny 0
2017-01-01 paris 45 20 Sunny 0,
1: city temperature windspeed event index
day
2017-01-02 newyork 36 7 Sunny 1
2017-01-02 mumbai 85 12 Fog 1
2017-01-02 paris 50 13 Cloudy 1,
2: city temperature windspeed event index
day
2017-01-03 newyork 28 12 Snow 2
2017-01-03 mumbai 87 15 Fog 2
2017-01-03 paris 54 8 Cloudy 2,
3: city temperature windspeed event index
day
2017-01-04 newyork 33 7 Sunny 3
2017-01-04 mumbai 92 5 Rain 3
2017-01-04 paris 42 10 Cloudy 3}
然后可以通过d[0]
,d[1]
,d[2]
,d[3]
提取单个组".在这种情况下,您可能希望按日期分组,即
You can then extract individual "groups" via d[0]
, d[1]
, d[2]
, d[3]
. In this particular case, you may wish to group by dates instead, i.e.
d = {df_.index[0]: df_ for _, df_ in df.groupby('index')}
这篇关于将大 pandas 中的groupby()分成较小的组并将其组合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!