计算空间连接后每个国家/地区的空间平均值 [英] Calculating spatial averages for each country after spatial join

查看:21
本文介绍了计算空间连接后每个国家/地区的空间平均值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

你好,我在底部使用以下代码从坐标中提取国家/地区。请参阅以下URL,其中提供了代码的更详细说明:Extracting countries from NetCDF data using geopandas

我的主要变量/值是来自https://psl.noaa.gov/data/gridded/data.pdsi.html的月平均PDSI值。下面的图像表示由下面的代码创建的可视化的一部分。带阴影的方块表示PDSI值的空间区域,该区域与世界的Shapefile重叠。

从比利时的图像中可以看到,接触比利时土地面积的4个方块也在触摸其他国家。如果我将基本值归因于比利时,我认为这高估了平均PDSI值。特别是当考虑到最下面的两个方块几乎没有触及比利时时,这些值在计算平均值时的权重应该明显较低。因此,有没有办法纳入某种加权平均数,其中一个国家内每个正方形的面积可以用作调整每个PDSI值的权重?此外,我希望不仅为比利时,而且也为所有国家标准化这一进程。

如有任何帮助,不胜感激!

import geopandas as gpd
import numpy as np
import plotly.express as px
import requests
from pathlib import Path
from zipfile import ZipFile
import urllib
import shapely.geometry
import xarray as xr

# download NetCDF data...
# fmt: off
url = "https://psl.noaa.gov/repository/entry/get/pdsi.mon.mean.selfcalibrated.nc?entryid=synth%3Ae570c8f9-ec09-4e89-93b4-babd5651e7a9%3AL2RhaV9wZHNpL3Bkc2kubW9uLm1lYW4uc2VsZmNhbGlicmF0ZWQubmM%3D"
f = Path.cwd().joinpath(Path(urllib.parse.urlparse(url).path).name)
# fmt: on

if not f.exists():
    r = requests.get(url, stream=True, headers={"User-Agent": "XY"})
    with open(f, "wb") as fd:
        for chunk in r.iter_content(chunk_size=128):
            fd.write(chunk)
ds = xr.open_dataset(f)
pdsi = ds.to_dataframe()
pdsi = pdsi.reset_index().dropna()  # don't care about places in oceans...

# use subset for testing... last 5 times...
pdsim = pdsi.loc[pdsi["time"].isin(pdsi.groupby("time").size().index[-5:])]

# create geopandas dataframe
gdf = gpd.GeoDataFrame(
    pdsim, geometry=pdsim.loc[:, ["lon", "lat"]].apply(shapely.geometry.Point, axis=1)
)

# make sure that data supports using a buffer...
assert (
    gdf["lat"].diff().loc[lambda s: s.ne(0)].mode()
    == gdf["lon"].diff().loc[lambda s: s.ne(0)].mode()
).all()
# how big should the square buffer be around the point??
buffer = gdf["lat"].diff().loc[lambda s: s.ne(0)].mode().values[0] / 2
gdf["geometry"] = gdf["geometry"].buffer(buffer, cap_style=3)

# Import shapefile from geopandas
path_to_data = gpd.datasets.get_path("naturalearth_lowres")
world_shp = gpd.read_file(path_to_data)

# the solution... spatial join buffered polygons to countries
# comma separate associated countries
gdf = gdf.join(
    world_shp.sjoin(gdf.set_crs("EPSG:4326"))
    .groupby("index_right")["name"]
    .agg(",".join)
)
gdf["time_a"] = gdf["time"].dt.strftime("%Y-%b-%d")

# simplest way to test is visualise...
px.choropleth_mapbox(
    gdf,
    geojson=gdf.geometry,
    locations=gdf.index,
    color="pdsi",
    hover_data=["name"],
    animation_frame="time_a",
    opacity=.3
).update_layout(
    mapbox={"style": "carto-positron", "zoom": 1},
    margin={"l": 0, "r": 0, "t": 0, "b": 0},
)

推荐答案

我不知道以这种方式(无论是平均值还是加权平均值)汇总PDSI在数学上/科学上是否合理。这确实演示了如何获取您的问题请求的结果。

# the solution... spatial join buffered polygons to countries
# plus work out overlap between PDSI grid and country.  Area of each grid is constant...
gdf_c = (
    world_shp.sjoin(gdf.set_crs("EPSG:4326"))
    .merge(
        gdf.loc[:, "geometry"],
        left_on="index_right",
        right_index=True,
        suffixes=("", "_pdsi"),
    )
    .assign(
        overlap=lambda d: (
            d["geometry"]
            .intersection(gpd.GeoSeries(d["geometry_pdsi"], crs="EPSG:4326"))
            .area
            / (buffer * 2) ** 2
        ).round(3)
    )
)

# comma separate associated countries and a list of overlaps
gdf_pdsi = gdf.loc[:, ["geometry", "time", "pdsi"]].join(
    gdf_c.groupby("index_right").agg({"name": ",".join, "overlap": list})
)
gdf_pdsi["time_a"] = gdf_pdsi["time"].dt.strftime("%Y-%b-%d")

# simplest way to test is visualise...
fig_buf = px.choropleth_mapbox(
    gdf_pdsi,
    geojson=gdf_pdsi.geometry,
    locations=gdf_pdsi.index,
    color="pdsi",
    hover_data=["name", "overlap"],
    animation_frame="time_a",
    opacity=0.3,
).update_layout(
    mapbox={"style": "carto-positron", "zoom": 1},
    margin={"l": 0, "r": 0, "t": 0, "b": 0},
)

fig_buf

import pandas as pd

# prepare data to plot by country
df_pdsi = (
    gdf_c.groupby(["name", "time"])
    .apply(
        lambda d: pd.Series(
            {
                "weighted_pdsi": (d["pdsi"] * d["overlap"]).sum() / d["overlap"].sum(),
                "unweighted_pdsi": d["pdsi"].mean(),
                "min_pdsi": d["pdsi"].min(),
                "max_pdsi": d["pdsi"].max(),
                "min_overlap": d["overlap"].min(),
                "max_overlap": d["overlap"].max(),
                "size_pdsi": len(d["pdsi"]),
                # "pdsi_list":[round(v,2) for v in d["pdsi"]]
            }
        )
    )
    .reset_index()
)
df_pdsi["time_a"] = df_pdsi["time"].dt.strftime("%Y-%b-%d")
fig = px.choropleth_mapbox(
    df_pdsi,
    geojson=world_shp.set_index("name").loc[:, "geometry"],
    locations="name",
    color="weighted_pdsi",
    hover_data=df_pdsi.columns,
    animation_frame="time_a",
    opacity=0.3,
).update_layout(
    mapbox={"style": "carto-positron", "zoom": 1},
    margin={"l": 0, "r": 0, "t": 0, "b": 0},
)
fig

这篇关于计算空间连接后每个国家/地区的空间平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆