使用 pandas 将月度数据表转换为季节性时间序列 [英] Convert monthly data table to seasonal time series using pandas

查看:93
本文介绍了使用 pandas 将月度数据表转换为季节性时间序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些表格格式的数据,其中行是年,列是月.我想将其转换为 Pandas 中的时间序列格式,然后按季节平均值对数据进行分组(我特别想将冬季定义为 11 月至 3 月,并丢弃其他所有内容).这是我将数据读入表格格式的代码,因此您可以看到发生了什么:

将pandas导入为pdheaders = ['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']df = pd.read_csv('https://www.esrl.noaa.gov/psd/gcos_wgsp/Timeseries/Data/nino34.long.anom.data',delimiter='\s+', header=0, names=headers, skiprows=1, index_col=0)df = df.drop(df.tail(7).index)df

 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec1871 -0.25 -0.58 -0.43 -0.50 -0.70 -0.53 -0.60 -0.33 -0.24 -0.33 -0.31 -0.581872 -0.72 -0.62 -0.50 -0.77 -0.62 -0.52 -0.32 -0.85 -1.02 -0.94 -0.79 -0.881873 -0.78 -1.01 -1.31 -0.67 -0.53 -0.48 -0.58 -0.39 -0.34 -0.78 -0.77 -0.701874 -0.93 -1.06 -1.40 -0.94 -0.86 -0.72 -1.00 -1.05 -1.13 -1.25 -1.33 -1.141875 -0.71 -0.37 -0.59 -0.87 -1.09 -0.76 -0.85 -0.81 -0.91 -0.83 -0.64 -0.751876 -0.95 -1.20 -1.13 -1.18 -1.08 -0.43 -0.34 -0.16 -0.02 0.11 0.15 0.231877 0.35 0.46 0.52 0.50 0.76 0.98 1.42 1.54 1.75 1.95 2.08 2.491878 2.41 2.43 1.31 0.92 0.82 0.92 0.25 -0.11 -0.32 -0.53 -0.70 -0.751879 -0.55 -0.18 -0.24 -0.37 -0.83 -0.67 -0.77 -0.69 -0.83 -0.93 -1.14 -1.021880 -1.00 -0.73 -0.62 -0.57 -0.71 -0.61 -0.53 -0.24 -0.03 0.17 0.24 0.181881 0.29 0.23 0.32 0.41 0.16 0.23 -0.26 -0.17 -0.33 -0.43 -0.59 -0.371882 -0.45 -0.55 -0.53 0.10 0.01 -0.40 -0.57 -0.33 -0.51 -0.65 -0.92 -0.75

我想要的结果是每个冬季 11 月至 3 月值的平均值的时间序列(例如,2018 年 11 月、2018 年 12 月、2019 年 1 月、2019 年 2 月和 2019 年 3 月的平均值).我猜这个过程包括首先将这些数据分类到一个时间序列中(我不确定如何做),然后按季节(11 月至 3 月)分组,然后丢弃其他月份.

解决方案

给定如图所示的数据

  • 年份是索引
  • 查看

    I have some data in table format, where the rows are years, and the columns are months. I would like to convert this to a time series format in pandas, and then group the data by seasonal mean (I would specifically like to define the winter season as November-March, and discard everything else). Here's the code I have that reads the data into the table format, so you can see what's going on:

    import pandas as pd
    headers = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
    df = pd.read_csv('https://www.esrl.noaa.gov/psd/gcos_wgsp/Timeseries/Data/nino34.long.anom.data',
                       delimiter='\s+', header=0, names=headers, skiprows=1, index_col=0)
    df = df.drop(df.tail(7).index)
    df
    

             Jan     Feb     Mar     Apr     May     Jun     Jul     Aug     Sep     Oct     Nov     Dec
    1871    -0.25   -0.58   -0.43   -0.50   -0.70   -0.53   -0.60   -0.33   -0.24   -0.33   -0.31   -0.58
    1872    -0.72   -0.62   -0.50   -0.77   -0.62   -0.52   -0.32   -0.85   -1.02   -0.94   -0.79   -0.88
    1873    -0.78   -1.01   -1.31   -0.67   -0.53   -0.48   -0.58   -0.39   -0.34   -0.78   -0.77   -0.70
    1874    -0.93   -1.06   -1.40   -0.94   -0.86   -0.72   -1.00   -1.05   -1.13   -1.25   -1.33   -1.14
    1875    -0.71   -0.37   -0.59   -0.87   -1.09   -0.76   -0.85   -0.81   -0.91   -0.83   -0.64   -0.75
    1876    -0.95   -1.20   -1.13   -1.18   -1.08   -0.43   -0.34   -0.16   -0.02   0.11    0.15    0.23
    1877    0.35    0.46    0.52    0.50    0.76    0.98    1.42    1.54    1.75    1.95    2.08    2.49
    1878    2.41    2.43    1.31    0.92    0.82    0.92    0.25    -0.11   -0.32   -0.53   -0.70   -0.75
    1879    -0.55   -0.18   -0.24   -0.37   -0.83   -0.67   -0.77   -0.69   -0.83   -0.93   -1.14   -1.02
    1880    -1.00   -0.73   -0.62   -0.57   -0.71   -0.61   -0.53   -0.24   -0.03   0.17    0.24    0.18
    1881    0.29    0.23    0.32    0.41    0.16    0.23    -0.26   -0.17   -0.33   -0.43   -0.59   -0.37
    1882    -0.45   -0.55   -0.53   0.10    0.01    -0.40   -0.57   -0.33   -0.51   -0.65   -0.92   -0.75
    

    The result I'm going for is a time series of the mean of the November-March values of each winter season (for example, mean of Nov 2018, Dec 2018, Jan 2019, Feb 2019 & March 2019). I'm guessing the process involves first sorting this data into a timeseries (which I am unsure about how to do) and then grouping by season (Nov-March), and dropping the other months.

    解决方案

    Given the data as shown

    import pandas as pd
    import matplotlib.pyplot as plt
    import seaborn as sns
    
    # reset index
    df.reset_index(inplace=True)
    
    # rename the column to year
    df.rename(columns={'index': 'year'}, inplace=True)
    
    # melt df into a long format
    df2 = df.melt(id_vars='year', var_name='month')
    
    # create a datatime column and set it as index
    df2['date'] = pd.to_datetime(df2['year'].astype(str) + '-' + df2['month'])
    df2.set_index('date', inplace=True)
    
    # define a winter dataframe
    winter = df2[df2['month'].isin(['Nov', 'Dec', 'Jan', 'Feb', 'Mar'])].copy()
    
    # sort the index
    winter.sort_index(inplace=True)
    

    winter dataframe

                year month  value
    date                         
    1871-01-01  1871   Jan  -0.25
    1871-02-01  1871   Feb  -0.58
    1871-03-01  1871   Mar  -0.43
    1871-11-01  1871   Nov  -0.31
    1871-12-01  1871   Dec  -0.58
    1872-01-01  1872   Jan  -0.72
    1872-02-01  1872   Feb  -0.62
    1872-03-01  1872   Mar  -0.50
    1872-11-01  1872   Nov  -0.79
    1872-12-01  1872   Dec  -0.88
    1873-01-01  1873   Jan  -0.78
    1873-02-01  1873   Feb  -1.01
    1873-03-01  1873   Mar  -1.31
    1873-11-01  1873   Nov  -0.77
    1873-12-01  1873   Dec  -0.70
    1874-01-01  1874   Jan  -0.93
    1874-02-01  1874   Feb  -1.06
    1874-03-01  1874   Mar  -1.40
    1874-11-01  1874   Nov  -1.33
    1874-12-01  1874   Dec  -1.14
    1875-01-01  1875   Jan  -0.71
    1875-02-01  1875   Feb  -0.37
    1875-03-01  1875   Mar  -0.59
    1875-11-01  1875   Nov  -0.64
    1875-12-01  1875   Dec  -0.75
    1876-01-01  1876   Jan  -0.95
    1876-02-01  1876   Feb  -1.20
    1876-03-01  1876   Mar  -1.13
    1876-11-01  1876   Nov   0.15
    1876-12-01  1876   Dec   0.23
    1877-01-01  1877   Jan   0.35
    1877-02-01  1877   Feb   0.46
    1877-03-01  1877   Mar   0.52
    1877-11-01  1877   Nov   2.08
    1877-12-01  1877   Dec   2.49
    1878-01-01  1878   Jan   2.41
    1878-02-01  1878   Feb   2.43
    1878-03-01  1878   Mar   1.31
    1878-11-01  1878   Nov  -0.70
    1878-12-01  1878   Dec  -0.75
    1879-01-01  1879   Jan  -0.55
    1879-02-01  1879   Feb  -0.18
    1879-03-01  1879   Mar  -0.24
    1879-11-01  1879   Nov  -1.14
    1879-12-01  1879   Dec  -1.02
    1880-01-01  1880   Jan  -1.00
    1880-02-01  1880   Feb  -0.73
    1880-03-01  1880   Mar  -0.62
    1880-11-01  1880   Nov   0.24
    1880-12-01  1880   Dec   0.18
    1881-01-01  1881   Jan   0.29
    1881-02-01  1881   Feb   0.23
    1881-03-01  1881   Mar   0.32
    1881-11-01  1881   Nov  -0.59
    1881-12-01  1881   Dec  -0.37
    1882-01-01  1882   Jan  -0.45
    1882-02-01  1882   Feb  -0.55
    1882-03-01  1882   Mar  -0.53
    1882-11-01  1882   Nov  -0.92
    1882-12-01  1882   Dec  -0.75
    

    • In this example, the winter of 1870 and 1882 are incomplete, comprised of 3 and 2 months, respectively.
      • Those 5 dates will be dropped
    • Compute the average for each season

    # drop incomplete season at the beginning and end of the winter dataframe
    winter.drop([pd.Timestamp('1871-01-01'),
                 pd.Timestamp('1871-02-01'),
                 pd.Timestamp('1871-03-01'),
                 pd.Timestamp('1882-11-01'),
                 pd.Timestamp('1882-12-01')], inplace=True)
    
    # compute mean of season, every 5 rows
    yearly_avg = winter.groupby(winter.reset_index().index // 5)['value'].mean()
    
    # create label for season start data
    winter_year = pd.date_range(start='1871-11-01', periods=11, freq='12MS')
    
    # create winter_avg dataframe
    winter_avg = pd.DataFrame({'winter_start': winter_year, 'season_avg': yearly_avg})
    winter_avg.set_index('winter_start', inplace=True)
    
                  season_avg
    winter_start            
    1871-11-01        -0.546
    1872-11-01        -0.954
    1873-11-01        -0.972
    1874-11-01        -0.828
    1875-11-01        -0.934
    1876-11-01         0.342
    1877-11-01         2.144
    1878-11-01        -0.484
    1879-11-01        -0.902
    1880-11-01         0.252
    1881-11-01        -0.498
    

    Plot the data:

    # prevent future warning
    from pandas.plotting import register_matplotlib_converters
    register_matplotlib_converters()
    
    sns.lineplot(winter.index, winter['value'], label='all values')
    sns.scatterplot(winter_avg.index, winter_avg['season_avg'], label='season avg', color='orange')
    plt.ylabel('value')
    plt.xlabel('year')
    plt.show()
    

    这篇关于使用 pandas 将月度数据表转换为季节性时间序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆