计算 xarray 中每个网格点的百分位数 [英] Calculating percentile for each gridpoint in xarray

查看:168
本文介绍了计算 xarray 中每个网格点的百分位数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在使用 xarray 制作概率图.我想使用像计数"练习这样的统计评估.意思是,对于 NEU 中的所有数据点,计算两个变量共同超过其阈值的次数.这意味着降水数据的第 1 个百分点和温度数据的第 99 个百分点.那么连接发生的概率 (P) 就是连接超出的数量除以数据集中数据点的数量.

I am currently using xarray to make probability maps. I want to use a statistical assessment like a "counting" exercise. Meaning, for all data points in NEU count how many times both variables jointly exceed their threshold. That means 1th percentile of the precipitation data and 99th percentile of temperature data. Then the probability (P) of join occurrence is simply the number of joint exceedances divided by the number of data points in your dataset.

<xarray.Dataset>
Dimensions:    (latitude: 88, longitude: 200, time: 6348)
Coordinates:
  * latitude   (latitude) float64 49.62 49.88 50.12 50.38 ... 70.88 71.12 71.38
  * longitude  (longitude) float64 -9.875 -9.625 -9.375 ... 39.38 39.62 39.88
  * time       (time) datetime64[ns] 1950-06-01 1950-06-02 ... 2018-08-31
Data variables:
    rr         (time, latitude, longitude) float32 dask.array<chunksize=(6348, 88, 200), meta=np.ndarray>
    tx         (time, latitude, longitude) float32 dask.array<chunksize=(6348, 88, 200), meta=np.ndarray>
    Ellipsis   float64 0.0

我想计算每个网格点的降水和温度的百分位数,这基本上意味着我想为每个网格点重复下面的函数.

I want to calculate the percentile of both precipitation and temperature for each gridpoint, that means basically that I want to repeat the function below for every gridpoint.

Neu_Precentile=np.nanpercentile(NEU.rr[:,0,0],1)

谁能帮我解决这个问题.我也尝试使用 xr.apply_ufunc 但不幸的是效果不佳.

Can anyone help me out with this problem. I also tried to use xr.apply_ufunc but unfortunately it doesn't worked out well.

推荐答案

我不确定您想如何处理分位数,但这里有一个您可以适应的版本.

I'm not sure how you want to process quantiles, but here is a version from which you may be able to adapt.

此外,我选择在计算分位数时保留数据集结构,因为它显示了如何检索异常值的值(如果这与检索有效数据点的值相距一步之遥,这可能相关).

Also, I chose to keep the dataset structure when computing the quantiles, as it shows how to retrieve the values of the outliers if this is ever relevant (and it is one step away from retrieving the values of valid data points, which is likely relevant).

coords = ("time", "latitude", "longitude")
sizes = (500, 80, 120)

ds = xr.Dataset(
    coords={c: np.arange(s) for c, s in zip(coords, sizes)},
    data_vars=dict(
        precipitation=(coords, np.random.randn(*sizes)),
        temperature=(coords, np.random.randn(*sizes)),
    ),
)

查看数据:

<xarray.Dataset>
Dimensions:        (latitude: 80, longitude: 120, time: 500)
Coordinates:
  * time           (time) int64 0 1 2 3 ... 496 497 498 499
  * latitude       (latitude) int64 0 1 2 3 ... 76 77 78 79
  * longitude      (longitude) int64 0 1 2 3 ... 117 118 119
Data variables:
    precipitation  (time, latitude, longitude) float64 -1.673 ... -0.3323
    temperature    (time, latitude, longitude) float64 -0.331 ... -0.03728

2.计算分位数

qt_dims = ("latitude", "longitude")
qt_values = (0.1, 0.9)

ds_qt = ds.quantile(qt_values, dim=qt_dims)

这是一个数据集,丢失了分析维度(纬度"、经度"),并带有一个新的分位数"维度:

It is a Dataset, with dimensions of analysis ("latitude", "longitude") lost, and with a new "quantile" dimension:

<xarray.Dataset>
Dimensions:        (quantile: 2, time: 500)
Coordinates:
  * time           (time) int64 0 1 2 3 ... 496 497 498 499
  * quantile       (quantile) float64 0.1 0.9
Data variables:
    precipitation  (quantile, time) float64 -1.305 ... 1.264
    temperature    (quantile, time) float64 -1.267 ... 1.254

3.计算异常值共现

对于异常值的位置:(使用 np.logical_and,比 & 操作符更具可读性)

3. Compute outliers co-occurrence

For the locations of outliers: (edit: use of np.logical_and, more readable than the & operator)

da_outliers_loc = np.logical_and(
    ds.precipitation > ds_qt.precipitation.sel(quantile=qt_values[0]),
    ds.temperature > ds_qt.temperature.sel(quantile=qt_values[1]),
)

输出是一个布尔数据数组:

The output is a boolean DataArray:

<xarray.DataArray (time: 500, latitude: 80, longitude: 120)>
array([[[False, ...]]])
Coordinates:
  * time       (time) int64 0 1 2 3 4 ... 496 497 498 499
  * latitude   (latitude) int64 0 1 2 3 4 ... 75 76 77 78 79
  * longitude  (longitude) int64 0 1 2 3 ... 116 117 118 119

如果值是相关的:

ds_outliers = ds.where(
    (ds.precipitation > ds_qt.precipitation.sel(quantile=qt_values[0]))
    & (ds.temperature > ds_qt.temperature.sel(quantile=qt_values[1]))
)

4.计算每个时间步的异常值

outliers_count = da_outliers_loc.sum(dim=qt_dims)

最后,这里是只有时间维度的 DataArray,并且具有每个时间戳的异常值数量.

Finally, here is the DataArray with only a time dimension, and having for values the number of outliers at each timestamp.

<xarray.DataArray (time: 500)>
array([857, ...])
Coordinates:
  * time     (time) int64 0 1 2 3 4 ... 495 496 497 498 499

这篇关于计算 xarray 中每个网格点的百分位数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆