将每日时间序列与具有 NaN 值阈值的每月时间序列相加 [英] Sum a daily time series into a monthly time series with a NaN value threshold

查看:70
本文介绍了将每日时间序列与具有 NaN 值阈值的每月时间序列相加的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个从 1979 年 1 月 1 日到 2005 年 12 月 31 日的 3D 时间序列数据矩阵.该矩阵当前为 9862x360x720(日降雨量 x 0.5° 纬度 x 0.5° 经度).我想将每日降雨量总和为每月降雨量(总共 324 个月),同时还设置一个阈值来求和 NaN 值.

I have a 3D time series data matrix from January 1st, 1979 to December 31st, 2005. The matrix is currently 9862x360x720 (daily rainfall x 0.5° latitude x 0.5° longitude). I want to sum the daily rainfall into monthly rainfall (a total of 324 months) while also setting a threshold for summing NaN values.

换句话说,如果一个特定的经纬度网格单元有超过 10 个每日 NaN 值,我想将每月总和单元标记为 NaN.如果网格单元的每日 NaN 值少于 10 个,我想对剩余的非 NaN 每日值求和并将其用作每月值.

In other words, If there are more than 10 daily NaN values for a particular lat/lon grid cell, I want to marked the monthly summed cell as NaN. If there are less than 10 daily NaN values for the grid cell, I want to sum the remaining non-NaN daily values and use that as the monthly value.

我成功地使用了 xarray 库的resample"函数,但我想不出一种方法来设置 NaN 值的阈值.我读过的所有内容都说要使用 sum 或 nansum 函数,但是我找不到通过这两个函数中的任何一个设置 NaN 阈值的方法.我现在对任何方法都持开放态度(xarray 或其他方法).

I had success using the xarray library's "resample" function, but I couldn't figure out a way to set a threshold for NaN values. Everything I've read says to use sum or nansum functions, but I can't find a way to set a NaN threshold through either of those functions. I'm open to any method at this point (xarray or otherwise).

import netCDF4
import numpy as np
import xarray as xr
import pandas as pd

f = netCDF4.Dataset("daily_data", 'r')

daily_dataset = xr.Dataset({'precipitation': (['time', 'lat', 'lon'],  f['precipitation'][:, :, :])},
             coords={'lat': (f['lat'][:]), 'lon': (f['lon'][:]), 'time': pd.date_range('1979-01-01', periods=9862)})

monthly_dataset = daily_dataset['precipitation'].resample('M', dim='time', how='sum', skipna=False)

我能够使用上述代码将每日数据汇总为每月数据,但无法设置 NaN 阈值.每日数据当前存储在 NetCDF 文件中.

I was able to sum the daily data to monthly with the above code, but I was not able to set a NaN threshold. The daily data is currently stored in a NetCDF file.

推荐答案

我相信这可以满足您的需求:

I believe this does what you want:

NaN = float("nan") # Make a constant for NaN

def sum_nan_threshold(iterable, *, nan_threshold=10):
    if sum(x == NaN for x in iterable) >= nan_threshold: # Are there more NaNs then threshold?
        return NaN
    else:
        return sum(x for x in iterable if x != NaN) # Else sum up if not equal to NaN

这篇关于将每日时间序列与具有 NaN 值阈值的每月时间序列相加的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆