从日期时间 pandas 中提取季节 [英] Extract seasons from datetime pandas

查看:90
本文介绍了从日期时间 pandas 中提取季节的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从带有日期时间列的大型数据框中提取季节.这是我使用的代码:

I am trying to extract the seasons from a large dataframe with a date time column. This is the code I have used:

def season_of_date(date_UTC):
    year = str(date_UTC.year)
    seasons = {'spring': pd.date_range(start= year +'-03-21 00:00:00', end=year + '-06-20 00:00:00'),
               'summer': pd.date_range(start= year + '-06-21 00:00:00', end= year + '-09-22 00:00:00'),
               'autumn': pd.date_range(start= year + '-09-23 00:00:00', end= year + '-12-20 00:00:00')}
    if date_UTC in seasons['spring']:
        return 'spring'
    if date_UTC in seasons['summer']:
        return 'summer'
    if date_UTC in seasons['autumn']:
        return 'autumn'
    else:
        return 'winter'

df['season'] = df.date_UTC.map(season_of_date)

问题在于我不知道如何处理日期时间列中的小时,分​​钟和秒,因此除了时间为00:00的日期时间条目外,我最终得到的结果大部分是冬天:00:

The issue lies in the fact that I dont know how to handle the hours minutes and seconds in my datetime column, so I end up with a result that is mostly winter, apart from datetime entries when the time is 00:00:00:

date_UTC    season
616602  2019-11-24 17:00:00 winter
792460  2019-06-18 13:00:00 winter
230088  2019-11-30 07:00:00 winter
560826  2019-05-20 08:00:00 winter
718547  2019-03-23 04:00:00 winter
241890  2020-01-11 03:00:00 winter
513845  2018-12-23 22:00:00 winter
665954  2019-03-18 00:00:00 winter
474988  2019-05-20 08:00:00 winter
120281  2019-04-22 12:00:00 winter
697519  2018-10-12 05:00:00 winter
669144  2019-09-10 11:00:00 winter
310637  2019-11-03 04:00:00 winter
127973  2018-12-01 10:00:00 winter
325177  2019-03-16 11:00:00 winter
785162  2019-05-07 21:00:00 winter
840131  2018-11-24 00:00:00 autumn
580472  2020-01-10 19:00:00 winter
635219  2019-12-16 23:00:00 winter
799642  2019-11-11 18:00:00 winter

我可以对如何修改代码以正确映射季节提供一些建议吗?

Can I have some advice on how to modify my code so that the seasons map correctly?

更新:

我修改了代码,为timestamp元素创建了一个字符串,并认为这可以解决问题,但事实并非如此..像这样进行修改后,我最终遇到此错误:

I modified the code to create a string for the timestamp element and thought this would fix the issue but it didnt.. After making the modification like so I end up with this error:

def season_of_date(date_UTC):
    year = str(date_UTC.year)
    time = str(date_UTC.time)
    seasons = {'spring': pd.date_range(start= year +'-03-21' + time, end=year + '-06-20' + time),
               'summer': pd.date_range(start= year + '-06-21' + time, end= year + '-09-22' + time),
               'autumn': pd.date_range(start= year + '-09-23' + time, end= year + '-12-20' + time)}
    if date_UTC in seasons['spring']:
        return 'spring'
    if date_UTC in seasons['summer']:
        return 'summer'
    if date_UTC in seasons['autumn']:
        return 'autumn'
    else:
        return 'winter'

df['season'] = df.date_UTC.map(season_of_date)

ValueError: could not convert string to Timestamp

第二次更新:

我最终要做的是,它很快,但是我不喜欢该解决方案,因为它错误地将整个月分为几个季节,而实际上在给定的一年中,一个季节可能会开始到一个月的中途.

What I ended up doing was the following, it is fast but I don't like the solution since it wrongly groups whole months into seasons, when actually for a given year a season may start mid way through a month.

df['season'] = (df['date_UTC'].dt.month%12 + 3)//3

seasons = {
             1: 'Winter',
             2: 'Spring',
             3: 'Summer',
             4: 'Autumn'
}

df['season_name'] = df['season'].map(seasons)

推荐答案

首先,您希望 date_UTC datetime 格式,其次,您可以使用 pd.切:

first you want your date_UTC in datetime format, second, you can use pd.cut:

date = df.date_UTC.dt.month*100 + df.date_UTC.dt.day
df['season'] = (pd.cut(date,[0,321,620,922,1220,1300],
                       labels=['winter','spring','summer','autumn','winter '])
                  .str.strip()
               )

通过一些数字技巧,您可以摆脱缓慢的 str.strip():

With a little numeric trick, you can get rid of the slow str.strip() :

df['date_offset'] = (df.date_UTC.dt.month*100 + df.date_UTC.dt.day - 320)%1300

df['season'] = pd.cut(df['date_offset'], [0, 300, 602, 900, 1300], 
                      labels=['spring', 'summer', 'autumn', 'winter'])

这篇关于从日期时间 pandas 中提取季节的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆