使用xarray对大 pandas 使用的非标准CFTimeIndex日历(360天,无le年)进行重新采样的方法 [英] Ways to resample non-standard CFTimeIndex calendars (360-day, no-leap-year) with xarray for pandas usage

查看:96
本文介绍了使用xarray对大 pandas 使用的非标准CFTimeIndex日历(360天,无le年)进行重新采样的方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

#60198708 带我打开了它问题,因为我还没有找到好的解决方案.

#60198708 brought me to open this question as I've not found the nice solution yet.

我从EURO-CORDEX集成下载了几种气候模型,用于每日降水通量.虽然某些模型可以使用标准日历,并且与Pandas datetime 兼容,但其他模型(尤其是MOHC HadGem2 ES)却可以使用360天的 CFTimeIndex .

I have downloaded several climate models from the EURO-CORDEX ensemble for daily precipitaion flux. While some models work with standard calendar, compatible with Pandas datetime, others, particularly MOHC HadGem2 ES, use 360-day CFTimeIndex.

主要问题是,如何使用这些日历有效地对月度数据进行重新采样,以使其能够协调一致,并在以后生成整体统计信息.

The principal question is, how to effectively resample to monthly data with these calendars to be able to harmonize it and produce ensemble statistics later.

降水通量数据(2011-2015年摘录)如下所示您可以此处下载.

<xarray.Dataset>
Dimensions:       (bnds: 2, rlat: 412, rlon: 424, time: 1800)
Coordinates:
    lat           (rlat, rlon) float64 ...
    lon           (rlat, rlon) float64 ...
  * rlat          (rlat) float64 -23.38 -23.26 -23.16 ... 21.61 21.73 21.83
  * rlon          (rlon) float64 -28.38 -28.26 -28.16 ... 17.93 18.05 18.16
  * time          (time) object 2011-01-01 12:00:00 ... 2015-12-30 12:00:00
Dimensions without coordinates: bnds
Data variables:
    pr            (time, rlat, rlon) float32 ...
    rotated_pole  |S1 ...
    time_bnds     (time, bnds) object ...
Attributes:
    CDI:                            Climate Data Interface version 1.3.2
    Conventions:                    CF-1.6
    NCO:                            4.4.2
    CDO:                            Climate Data Operators version 1.3.2 (htt...
    contact:                        Fredrik Boberg, Danish Meteorological Ins...
    creation_date:                  2019-11-16 14:39:25
    experiment:                     Scenario experiment using HadGEM as drivi...
    experiment_id:                  rcp45
    driving_experiment:             MOHC-HadGEM2-ES,rcp45,r1i1p1
    driving_model_id:               MOHC-HadGEM2-ES
    driving_model_ensemble_member:  r1i1p1
    driving_experiment_name:        rcp45
    frequency:                      day
    institution:                    Danish Meteorological Institute
    institute_id:                   DMI
    model_id:                       DMI-HIRHAM5
    rcm_version_id:                 v2
    project_id:                     CORDEX
    CORDEX_domain:                  EUR-11
    product:                        output
    tracking_id:                    hdl:21.14103/158e462e-499c-4d6e-8462-ac3e...
    c3s_disclaimer:                 This data has been produced in the contex...

如您所见,数据集的时间维度为 cftime.Datetime360Day .所有的月份都是30天,这有时对气候预测有利,但对 pandas 却不利.

As you can see, dataset's time dimension is cftime.Datetime360Day. All months are 30-days which is sometimes good for climate projections, not for pandas though.

<xarray.DataArray 'time' (time: 1800)>
array([cftime.Datetime360Day(2011-01-01 12:00:00),
       cftime.Datetime360Day(2011-01-02 12:00:00),
       cftime.Datetime360Day(2011-01-03 12:00:00), ...,
       cftime.Datetime360Day(2015-12-28 12:00:00),
       cftime.Datetime360Day(2015-12-29 12:00:00),
       cftime.Datetime360Day(2015-12-30 12:00:00)], dtype=object)
Coordinates:
  * time     (time) object 2011-01-01 12:00:00 ... 2015-12-30 12:00:00
Attributes:
    standard_name:  time
    long_name:      time
    bounds:         time_bnds

到目前为止我尝试过的事情

我通过将CFTimeIndex转换为字符串,放入 pandas.DataFrame 并使用 pd.to_datetime errors = coerce

ds = xarray.open_dataset('data/mohc_hadgem2_es.nc')

def cft_to_string(cfttime_obj):
        month = str(cfttime_obj.month)
        day = str(cfttime_obj.day)

        # This is awful but there were no two-digit months/days by default
        month = '0'+month if len(month)==1 else month
        day = '0'+day if len(day)==1 else day

        return f'{cfttime_obj.year}-{month}-{day}'

# Apply above function
ds_time_strings = list(map(cft_to_string, ds['time']))

# Get precipitation values only (to use in pandas dataframe)
# Suppose the data are from multiple pixels (for whole of Europe)
# - that's why the mean(axis=(1,2))

precipitation = ds['pr'].values.mean(axis=(1,2))

# To dataframe
df = pd.DataFrame(index=ds_time_strings, data={'precipitation': precipitation})

# Coerce erroneous dates
df.index = pd.to_datetime(df.index, errors='coerce') # Now, dates such as 2011-02-30 are omitted

这将提供一个数据框,其中包含非标准日期(如NaT),并且缺少某些日期(第31天).我不介意,因为我创建了90年的预测.

This gives a dataframe with non-standard dates as NaT and some dates (31st days) are missing. I don't mind since I create projections for 90 years span.

            precipitation
2011-01-01  0.000049
2011-01-02  0.000042
2011-01-03  0.000031
2011-01-04  0.000030
2011-01-05  0.000038
... ...
2011-02-28  0.000041
NaT         0.000055
NaT         0.000046
2011-03-01  0.000031
... ...
2015-12-26  0.000028
2015-12-27  0.000034
2015-12-28  0.000028
2015-12-29  0.000025
2015-12-30  0.000024
1800 rows × 1 columns

现在,我可以轻松地使用 pandas 对月度数据进行重新采样了.

Now I can resample to monthly data using pandas easily.

虽然这似乎可行,但仅使用xarray/pandas有更清洁的方法吗?可能不是基于字符串的吗?

While this seems to work, is there a cleaner way with xarray/pandas only? Possibly non-string based?

  • ds.indexes ['time'].to_datetimeindex()在非标准日历上失败
  • ds.resample(time ='M')会进行重新采样,但是会产生非标准的月末.由于 ds ['time'].dt.floor('M') ValueError:< MonthEnd:n = 1>上失败,所以我没有找到纠正月底的方法.是非固定频率
  • xarray.groupby(time ='time.month')可以处理非标准日历,但是,它的用例是沿不同的轴分组,这是不希望的
  • ds.indexes['time'].to_datetimeindex() fails on a non-standard calendar
  • ds.resample(time='M') would do the resampling, however, it yields non-standard month ends. I did not find the way to floor to correct month ends since ds['time'].dt.floor('M') fails on ValueError: <MonthEnd: n=1> is a non-fixed frequency
  • xarray.groupby(time='time.month') can handle non-standard calendars, however, its use case is to group along different axes, which is undesired

我当然一定错过了一些事情,因为这是一个复杂的问题.任何帮助表示赞赏.

I certainly must have missed something as this is a complex issue. Any help appreciated.

推荐答案

感谢详细的示例!如果您的分析可接受每月平均值的时间序列,则我认为最干净的方法是将样本重新采样为月开始".频率,然后统一日期类型,例如对于由 CFTimeIndex 索引的数据集,类似:

Thanks for the detailed example! If a time series of monthly means is acceptable for your analysis, I think the cleanest approach would be to resample to "month-start" frequency and then harmonize the date types, e.g. for the datasets indexed by a CFTimeIndex, something like:

resampled = ds.resample(time="MS").mean()
resampled["time"] = resampled.indexes["time"].to_datetimeindex()

这基本上是您的第二个要点,但有少许更改.重新采样到开始月份的频率会解决以下问题:360天日历包含标准日历中不存在的月末,例如2月30日.

This is basically your second bullet point, but with a minor change. Resampling to month-start frequency gets around the issue that a 360-day calendar contains month ends that do not exist in a standard calendar, e.g. February 30th.

这篇关于使用xarray对大 pandas 使用的非标准CFTimeIndex日历(360天,无le年)进行重新采样的方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆