使用 pandas 将月度数据表转换为季节性时间序列 [英] Convert monthly data table to seasonal time series using pandas

查看：93 发布时间：2021/6/13 20:50:32 python pandas

本文介绍了使用 pandas 将月度数据表转换为季节性时间序列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一些表格格式的数据，其中行是年，列是月.我想将其转换为 Pandas 中的时间序列格式，然后按季节平均值对数据进行分组(我特别想将冬季定义为 11 月至 3 月，并丢弃其他所有内容).这是我将数据读入表格格式的代码，因此您可以看到发生了什么:

将pandas导入为pdheaders = ['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']df = pd.read_csv('https://www.esrl.noaa.gov/psd/gcos_wgsp/Timeseries/Data/nino34.long.anom.data',delimiter='\s+', header=0, names=headers, skiprows=1, index_col=0)df = df.drop(df.tail(7).index)df

 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec1871 -0.25 -0.58 -0.43 -0.50 -0.70 -0.53 -0.60 -0.33 -0.24 -0.33 -0.31 -0.581872 -0.72 -0.62 -0.50 -0.77 -0.62 -0.52 -0.32 -0.85 -1.02 -0.94 -0.79 -0.881873 -0.78 -1.01 -1.31 -0.67 -0.53 -0.48 -0.58 -0.39 -0.34 -0.78 -0.77 -0.701874 -0.93 -1.06 -1.40 -0.94 -0.86 -0.72 -1.00 -1.05 -1.13 -1.25 -1.33 -1.141875 -0.71 -0.37 -0.59 -0.87 -1.09 -0.76 -0.85 -0.81 -0.91 -0.83 -0.64 -0.751876 -0.95 -1.20 -1.13 -1.18 -1.08 -0.43 -0.34 -0.16 -0.02 0.11 0.15 0.231877 0.35 0.46 0.52 0.50 0.76 0.98 1.42 1.54 1.75 1.95 2.08 2.491878 2.41 2.43 1.31 0.92 0.82 0.92 0.25 -0.11 -0.32 -0.53 -0.70 -0.751879 -0.55 -0.18 -0.24 -0.37 -0.83 -0.67 -0.77 -0.69 -0.83 -0.93 -1.14 -1.021880 -1.00 -0.73 -0.62 -0.57 -0.71 -0.61 -0.53 -0.24 -0.03 0.17 0.24 0.181881 0.29 0.23 0.32 0.41 0.16 0.23 -0.26 -0.17 -0.33 -0.43 -0.59 -0.371882 -0.45 -0.55 -0.53 0.10 0.01 -0.40 -0.57 -0.33 -0.51 -0.65 -0.92 -0.75

我想要的结果是每个冬季 11 月至 3 月值的平均值的时间序列(例如，2018 年 11 月、2018 年 12 月、2019 年 1 月、2019 年 2 月和 2019 年 3 月的平均值).我猜这个过程包括首先将这些数据分类到一个时间序列中(我不确定如何做)，然后按季节(11 月至 3 月)分组，然后丢弃其他月份.

解决方案

给定如图所示的数据

年份是索引

查看

I have some data in table format, where the rows are years, and the columns are months. I would like to convert this to a time series format in pandas, and then group the data by seasonal mean (I would specifically like to define the winter season as November-March, and discard everything else). Here's the code I have that reads the data into the table format, so you can see what's going on:

import pandas as pd
headers = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
df = pd.read_csv('https://www.esrl.noaa.gov/psd/gcos_wgsp/Timeseries/Data/nino34.long.anom.data',
                   delimiter='\s+', header=0, names=headers, skiprows=1, index_col=0)
df = df.drop(df.tail(7).index)
df

         Jan     Feb     Mar     Apr     May     Jun     Jul     Aug     Sep     Oct     Nov     Dec
1871    -0.25   -0.58   -0.43   -0.50   -0.70   -0.53   -0.60   -0.33   -0.24   -0.33   -0.31   -0.58
1872    -0.72   -0.62   -0.50   -0.77   -0.62   -0.52   -0.32   -0.85   -1.02   -0.94   -0.79   -0.88
1873    -0.78   -1.01   -1.31   -0.67   -0.53   -0.48   -0.58   -0.39   -0.34   -0.78   -0.77   -0.70
1874    -0.93   -1.06   -1.40   -0.94   -0.86   -0.72   -1.00   -1.05   -1.13   -1.25   -1.33   -1.14
1875    -0.71   -0.37   -0.59   -0.87   -1.09   -0.76   -0.85   -0.81   -0.91   -0.83   -0.64   -0.75
1876    -0.95   -1.20   -1.13   -1.18   -1.08   -0.43   -0.34   -0.16   -0.02   0.11    0.15    0.23
1877    0.35    0.46    0.52    0.50    0.76    0.98    1.42    1.54    1.75    1.95    2.08    2.49
1878    2.41    2.43    1.31    0.92    0.82    0.92    0.25    -0.11   -0.32   -0.53   -0.70   -0.75
1879    -0.55   -0.18   -0.24   -0.37   -0.83   -0.67   -0.77   -0.69   -0.83   -0.93   -1.14   -1.02
1880    -1.00   -0.73   -0.62   -0.57   -0.71   -0.61   -0.53   -0.24   -0.03   0.17    0.24    0.18
1881    0.29    0.23    0.32    0.41    0.16    0.23    -0.26   -0.17   -0.33   -0.43   -0.59   -0.37
1882    -0.45   -0.55   -0.53   0.10    0.01    -0.40   -0.57   -0.33   -0.51   -0.65   -0.92   -0.75

The result I'm going for is a time series of the mean of the November-March values of each winter season (for example, mean of Nov 2018, Dec 2018, Jan 2019, Feb 2019 & March 2019). I'm guessing the process involves first sorting this data into a timeseries (which I am unsure about how to do) and then grouping by season (Nov-March), and dropping the other months.

解决方案

Given the data as shown

year is the index
review pandas.DataFrame.rolling to specify the window

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# reset index
df.reset_index(inplace=True)

# rename the column to year
df.rename(columns={'index': 'year'}, inplace=True)

# melt df into a long format
df2 = df.melt(id_vars='year', var_name='month')

# create a datatime column and set it as index
df2['date'] = pd.to_datetime(df2['year'].astype(str) + '-' + df2['month'])
df2.set_index('date', inplace=True)

# define a winter dataframe
winter = df2[df2['month'].isin(['Nov', 'Dec', 'Jan', 'Feb', 'Mar'])].copy()

# sort the index
winter.sort_index(inplace=True)

winter dataframe

            year month  value
date                         
1871-01-01  1871   Jan  -0.25
1871-02-01  1871   Feb  -0.58
1871-03-01  1871   Mar  -0.43
1871-11-01  1871   Nov  -0.31
1871-12-01  1871   Dec  -0.58
1872-01-01  1872   Jan  -0.72
1872-02-01  1872   Feb  -0.62
1872-03-01  1872   Mar  -0.50
1872-11-01  1872   Nov  -0.79
1872-12-01  1872   Dec  -0.88
1873-01-01  1873   Jan  -0.78
1873-02-01  1873   Feb  -1.01
1873-03-01  1873   Mar  -1.31
1873-11-01  1873   Nov  -0.77
1873-12-01  1873   Dec  -0.70
1874-01-01  1874   Jan  -0.93
1874-02-01  1874   Feb  -1.06
1874-03-01  1874   Mar  -1.40
1874-11-01  1874   Nov  -1.33
1874-12-01  1874   Dec  -1.14
1875-01-01  1875   Jan  -0.71
1875-02-01  1875   Feb  -0.37
1875-03-01  1875   Mar  -0.59
1875-11-01  1875   Nov  -0.64
1875-12-01  1875   Dec  -0.75
1876-01-01  1876   Jan  -0.95
1876-02-01  1876   Feb  -1.20
1876-03-01  1876   Mar  -1.13
1876-11-01  1876   Nov   0.15
1876-12-01  1876   Dec   0.23
1877-01-01  1877   Jan   0.35
1877-02-01  1877   Feb   0.46
1877-03-01  1877   Mar   0.52
1877-11-01  1877   Nov   2.08
1877-12-01  1877   Dec   2.49
1878-01-01  1878   Jan   2.41
1878-02-01  1878   Feb   2.43
1878-03-01  1878   Mar   1.31
1878-11-01  1878   Nov  -0.70
1878-12-01  1878   Dec  -0.75
1879-01-01  1879   Jan  -0.55
1879-02-01  1879   Feb  -0.18
1879-03-01  1879   Mar  -0.24
1879-11-01  1879   Nov  -1.14
1879-12-01  1879   Dec  -1.02
1880-01-01  1880   Jan  -1.00
1880-02-01  1880   Feb  -0.73
1880-03-01  1880   Mar  -0.62
1880-11-01  1880   Nov   0.24
1880-12-01  1880   Dec   0.18
1881-01-01  1881   Jan   0.29
1881-02-01  1881   Feb   0.23
1881-03-01  1881   Mar   0.32
1881-11-01  1881   Nov  -0.59
1881-12-01  1881   Dec  -0.37
1882-01-01  1882   Jan  -0.45
1882-02-01  1882   Feb  -0.55
1882-03-01  1882   Mar  -0.53
1882-11-01  1882   Nov  -0.92
1882-12-01  1882   Dec  -0.75

In this example, the winter of 1870 and 1882 are incomplete, comprised of 3 and 2 months, respectively.
- Those 5 dates will be dropped
Compute the average for each season

# drop incomplete season at the beginning and end of the winter dataframe
winter.drop([pd.Timestamp('1871-01-01'),
             pd.Timestamp('1871-02-01'),
             pd.Timestamp('1871-03-01'),
             pd.Timestamp('1882-11-01'),
             pd.Timestamp('1882-12-01')], inplace=True)

# compute mean of season, every 5 rows
yearly_avg = winter.groupby(winter.reset_index().index // 5)['value'].mean()

# create label for season start data
winter_year = pd.date_range(start='1871-11-01', periods=11, freq='12MS')

# create winter_avg dataframe
winter_avg = pd.DataFrame({'winter_start': winter_year, 'season_avg': yearly_avg})
winter_avg.set_index('winter_start', inplace=True)

              season_avg
winter_start            
1871-11-01        -0.546
1872-11-01        -0.954
1873-11-01        -0.972
1874-11-01        -0.828
1875-11-01        -0.934
1876-11-01         0.342
1877-11-01         2.144
1878-11-01        -0.484
1879-11-01        -0.902
1880-11-01         0.252
1881-11-01        -0.498

Plot the data:

# prevent future warning
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()

sns.lineplot(winter.index, winter['value'], label='all values')
sns.scatterplot(winter_avg.index, winter_avg['season_avg'], label='season avg', color='orange')
plt.ylabel('value')
plt.xlabel('year')
plt.show()

这篇关于使用 pandas 将月度数据表转换为季节性时间序列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用 pandas 将月度数据表转换为季节性时间序列 [英] Convert monthly data table to seasonal time series using pandas

问题描述

给定如图所示的数据

Given the data as shown

winter dataframe

Plot the data:

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用 pandas 将月度数据表转换为季节性时间序列 [英] Convert monthly data table to seasonal time series using pandas

问题描述

给定如图所示的数据

Given the data as shown

winter dataframe

Plot the data:

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭