python使用shapefile屏蔽netcdf数据 [英] python mask netcdf data using shapefile

查看:83
本文介绍了python使用shapefile屏蔽netcdf数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用以下软件包:

 将pandas导入为pd将numpy导入为np将xarray导入为xr以gpd格式导入geopandas 

我有以下存储数据的对象:

  print(precip_da)出去[]:< xarray.DataArray'precip'(时间:13665,纬度:200,经度:220)>[dtype = float32的601260000个值]座标:*经度(经度)float32 35.024994 35.074997 35.125 35.175003 ...*纬度(纬度)float32 5.0249977 5.074997 5.125 5.174999 ...* time(time)datetime64 [ns] 1981-01-01 1981-01-02 1981-01-03 ...属性:standard_name:对流降水率long_name:气候危害小组的红外降水与St ...单位:毫米/天time_step:天geostatial_lat_min:-50.0geostatial_lat_max:50.0geostatial_lon_min:-180.0geostatial_lon_max:180.0 

如下所示:

  precip_da.mean(dim ="time").plot() 

我将shapefile作为表示多边形的 geopandas.GeoDataFrame .

  awash = gpd.read_file(shp_dir)充斥出去[]:OID_名称FolderPath SymbolID AltMode基础固定的拉伸片段PopupInfo Shape_Leng Shape_Area几何0 0 Awash_Basin Awash_Basin.kml 0 0 0.0 -1 0无无30.180944 9.411263 POLYGON Z((41.78939511000004 11.5539922500000 ... 

外观如下:

  awash.plot() 

将一个绘制在另一个上,就像这样:

  ax = awash.plot(alpha = 0.2,color ='black')precip_da.mean(dim ="time").plot(ax = ax,zorder = -1) 

我的问题是,如何通过检查纬度点是否位于存储为 geopandas.GeoDataFrame 的shapefile内来掩盖 xarray.DataArray ?>

所以我只希望该shapefile内的降水量值(mm/day).

我想做以下事情:

  masked_precip = precip_da.within(awash) 

OR

  masked_precip = precip_da.loc [precip_da.isin(awash)] 

修改1

我已经考虑过使用 rasterio.mask

I am using the following packages:

import pandas as pd
import numpy as np
import xarray as xr
import geopandas as gpd

I have the following objects storing data:

print(precip_da)

Out[]:
    <xarray.DataArray 'precip' (time: 13665, latitude: 200, longitude: 220)>
    [601260000 values with dtype=float32]
    Coordinates:
      * longitude  (longitude) float32 35.024994 35.074997 35.125 35.175003 ...
      * latitude   (latitude) float32 5.0249977 5.074997 5.125 5.174999 ...
      * time       (time) datetime64[ns] 1981-01-01 1981-01-02 1981-01-03 ...
    Attributes:
        standard_name:       convective precipitation rate
        long_name:           Climate Hazards group InfraRed Precipitation with St...
        units:               mm/day
        time_step:           day
        geostatial_lat_min:  -50.0
        geostatial_lat_max:  50.0
        geostatial_lon_min:  -180.0
        geostatial_lon_max:  180.0

This looks as follows:

precip_da.mean(dim="time").plot()

I have my shapefile as a geopandas.GeoDataFrame which represents a polygon.

awash = gpd.read_file(shp_dir)

awash
Out[]:
  OID_         Name      FolderPath  SymbolID  AltMode Base  Clamped Extruded  Snippet PopupInfo Shape_Leng  Shape_Area  geometry
0     0 Awash_Basin Awash_Basin.kml         0        0  0.0       -1        0     None      None  30.180944    9.411263  POLYGON Z ((41.78939511000004 11.5539922500000...

Which looks as follows:

awash.plot()

Plotted one on top of the other they look like this:

ax = awash.plot(alpha=0.2, color='black')
precip_da.mean(dim="time").plot(ax=ax,zorder=-1)

My question is, how do I mask the xarray.DataArray by checking if the lat-lon points lie INSIDE the shapefile stored as a geopandas.GeoDataFrame?

 So I want ONLY the precipitation values (mm/day) which fall INSIDE that shapefile.

I want to do something like the following:

masked_precip = precip_da.within(awash)

OR

masked_precip = precip_da.loc[precip_da.isin(awash)]

EDIT 1

I have thought about using the rasterio.mask module but I don't know what format the input data needs to be. It sounds as if it does exactly the right thing:

"Creates a masked or filled array using input shapes. Pixels are masked or set to nodata outside the input shapes"

Reposted from GIS Stack Exchange here

解决方案

This is the current working solution that I have taken from this gist. This is Stephan Hoyer's answer to a github issue for the xarray project.

On top of the other packages above both affine and rasterio are required

from rasterio import features
from affine import Affine

def transform_from_latlon(lat, lon):
    """ input 1D array of lat / lon and output an Affine transformation
    """
    lat = np.asarray(lat)
    lon = np.asarray(lon)
    trans = Affine.translation(lon[0], lat[0])
    scale = Affine.scale(lon[1] - lon[0], lat[1] - lat[0])
    return trans * scale

def rasterize(shapes, coords, latitude='latitude', longitude='longitude',
              fill=np.nan, **kwargs):
    """Rasterize a list of (geometry, fill_value) tuples onto the given
    xray coordinates. This only works for 1d latitude and longitude
    arrays.

    usage:
    -----
    1. read shapefile to geopandas.GeoDataFrame
          `states = gpd.read_file(shp_dir+shp_file)`
    2. encode the different shapefiles that capture those lat-lons as different
        numbers i.e. 0.0, 1.0 ... and otherwise np.nan
          `shapes = (zip(states.geometry, range(len(states))))`
    3. Assign this to a new coord in your original xarray.DataArray
          `ds['states'] = rasterize(shapes, ds.coords, longitude='X', latitude='Y')`

    arguments:
    ---------
    : **kwargs (dict): passed to `rasterio.rasterize` function

    attrs:
    -----
    :transform (affine.Affine): how to translate from latlon to ...?
    :raster (numpy.ndarray): use rasterio.features.rasterize fill the values
      outside the .shp file with np.nan
    :spatial_coords (dict): dictionary of {"X":xr.DataArray, "Y":xr.DataArray()}
      with "X", "Y" as keys, and xr.DataArray as values

    returns:
    -------
    :(xr.DataArray): DataArray with `values` of nan for points outside shapefile
      and coords `Y` = latitude, 'X' = longitude.


    """
    transform = transform_from_latlon(coords[latitude], coords[longitude])
    out_shape = (len(coords[latitude]), len(coords[longitude]))
    raster = features.rasterize(shapes, out_shape=out_shape,
                                fill=fill, transform=transform,
                                dtype=float, **kwargs)
    spatial_coords = {latitude: coords[latitude], longitude: coords[longitude]}
    return xr.DataArray(raster, coords=spatial_coords, dims=(latitude, longitude))

def add_shape_coord_from_data_array(xr_da, shp_path, coord_name):
    """ Create a new coord for the xr_da indicating whether or not it 
         is inside the shapefile

        Creates a new coord - "coord_name" which will have integer values
         used to subset xr_da for plotting / analysis/

        Usage:
        -----
        precip_da = add_shape_coord_from_data_array(precip_da, "awash.shp", "awash")
        awash_da = precip_da.where(precip_da.awash==0, other=np.nan) 
    """
    # 1. read in shapefile
    shp_gpd = gpd.read_file(shp_path)

    # 2. create a list of tuples (shapely.geometry, id)
    #    this allows for many different polygons within a .shp file (e.g. States of US)
    shapes = [(shape, n) for n, shape in enumerate(shp_gpd.geometry)]

    # 3. create a new coord in the xr_da which will be set to the id in `shapes`
    xr_da[coord_name] = rasterize(shapes, xr_da.coords, 
                               longitude='longitude', latitude='latitude')

    return xr_da

It can be implemented as follows:

precip_da = add_shape_coord_from_data_array(precip_da, shp_dir, "awash")
awash_da = precip_da.where(precip_da.awash==0, other=np.nan)
awash_da.mean(dim="time").plot()

这篇关于python使用shapefile屏蔽netcdf数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆