pandas ,groupby和特定月份的求和 [英] Pandas, groupby and summing over specific months

查看:291
本文介绍了 pandas ,groupby和特定月份的求和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个DataFrame:

I have a DataFrame :

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 982 entries, 2009-10-30 00:00:00 to 2012-12-16 00:00:00
Data columns (total 4 columns):
rain        981  non-null values
temp_max    982  non-null values
temp_min    982  non-null values
temp        982  non-null values
dtypes: float64(4)

我使用每年/每月的求和:

For summing per Year/Month i use :

mdata = data.groupby([lambda x: x.year, lambda x: x.month]).agg([sum])

但是我需要进行季节性分析(夏季,冬季等),所以我如何才能创建特定年份的总和,例如每年的[1、2、3]?

But i need Seasonal analysis (summer, winter etc), so how i can create the Sum of specific months like [1 ,2 ,3] of each year?

Ty

推荐答案

是的,对我来说似乎很整洁的一种解决方案是使用Seasons字典,然后使用函数对数据进行分组.每个索引值都会调用一次作为组键传递的函数,并将返回值用作组名.

Yes, one solution which seems neat to me is to use a Seasons dictionary and then group the data using a function. Any function passed as a group key is called once per index value and the return values are used as the group names.

import pandas as pd
import numpy as np
from pandas import DataFrame
import datetime
# Create a year's worth of data
base = datetime.date.today() - datetime.timedelta(365)
Datelist = [base + datetime.timedelta(days = x) for x in range(365)]
DF = DataFrame(np.random.rand(365), index = Datelist)

# Create a Seasonal Dictionary that will map months to seasons
SeasonDict = {11: 'Winter', 12: 'Winter', 1: 'Winter', 2: 'Spring', 3: 'Spring', 4: 'Spring', 5: 'Summer', 6: 'Summer', 7: 'Summer', \
8: 'Autumn', 9: 'Autumn', 10: 'Autumn'}

# Write a function that will be used to group the data
def GroupFunc(x):
    return SeasonDict[x.month]

# Call the function with the groupby operation. 
Grouped = DF.groupby(GroupFunc)
Grouped.sum()

该函数获取每个索引值,并在季节字典"中查找月份,然后返回与月份键对应的值.然后,该值成为组名.

The function takes each index value and looks up the month in the Seasons Dictionary and returns the value corresponding to the month key. This value then becomes the group name.

或者,您也可以在示例中使用lambda(效率更高,但我认为上面的内容更容易理解):

Alternatively you can use the lambda as in your example (which is more efficient, but I thought the above would be easier to understand):

DF.groupby(lambda x: SeasonDict[x.month]).sum()

按注释的附加代码 在我看来,您最好对数据进行切片.因此,您可以执行以下操作

ADDITIONAL CODE AS PER COMMENTS It seems to me like you would be better off slicing the data. So you could do the following

DF['Season'] = ""
for row in DF.index:
    DF.Season[row] = SeasonDict[row.month]
DFWinter = DF[DF.Season == 'Winter']

现在,您有了一个包含冬季数据的新数据框,可以根据需要进行播放. 区别在于,groupby操作允许您对所有数据执行相同的操作,而听起来您想以不同的方式调查数据集不同部分的属性.为此,最好使用布尔切片来切片.

Now you have a new data frame with the winter data in, to play with as you desire. The difference is that the groupby operations allow you to undertake the same operations on all the data, whereas it sounds like you wanted to investigate the properties of different parts of your data set in different ways. To do that its better to slice, in this case using Boolean slicing.

这篇关于 pandas ,groupby和特定月份的求和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆