将多个GeoTIFF图像的栅格时间序列转换为NetCDF [英] Convert raster time series of multiple GeoTIFF images to NetCDF
问题描述
我有一个栅格时间序列存储在多个GeoTIFF
文件(*.tif
)中,我想将其转换为单个NetCDF
文件.数据为uint16
.
I have a raster time series stored in multiple GeoTIFF
files (*.tif
) that I'd like to convert to a single NetCDF
file. The data is uint16
.
我可能可以使用gdal_translate
使用以下命令将每个图像转换为netcdf:
I could probably use gdal_translate
to convert each image to netcdf using:
gdal_translate -of netcdf -co FORMAT=NC4 20150520_0164.tif foo.nc
,然后使用NCO
进行一些脚本编写,以从文件名中提取日期,然后进行连接,但是我想知道是否可以在Python中使用xarray
以及新的rasterio
后端更有效地执行此操作.
and then some scripting with NCO
to extract dates from filenames and then concatenate, but I was wondering whether I might do this more effectively in Python using xarray
and it's new rasterio
backend.
我可以轻松读取文件:
import glob
import xarray as xr
f = glob.glob('*.tif')
da = xr.open_rasterio(f[0])
da
返回
<xarray.DataArray (band: 1, y: 5490, x: 5490)>
[30140100 values with dtype=uint16]
Coordinates:
* band (band) int64 1
* y (y) float64 5e+05 5e+05 5e+05 5e+05 5e+05 4.999e+05 4.999e+05 ...
* x (x) float64 8e+05 8e+05 8e+05 8e+05 8.001e+05 8.001e+05 ...
Attributes:
crs: +init=epsg:32620
我可以将其中之一写到NetCDF:
and I can write one of these to NetCDF:
ds.to_netcdf('foo.nc')
,但理想情况下,我可以使用类似xr.open_mfdataset
的东西,写入时间值(从文件名中提取),然后将整个聚合写入netCDF
.并让dask
处理核心外的内存问题. :-)
but ideally I would be able to use something like xr.open_mfdataset
, write the time values (extracted from the filenames) and then write the entire aggregation to netCDF
. And have dask
handle the out-of-core memory issues. :-)
可以用xarray
和dask
完成类似的事情吗?
Can something like this be done with xarray
and dask
?
推荐答案
Xarray应该能够为您完成concat步骤.我在下面稍微修改了您的示例.您可以自行决定将文件名解析为有用的内容.
Xarray should be able to do the concat step for you. I have adapted your example a bit below. It will be up to you to parse the filenames into something useful.
import glob
import pandas as pd
import xarray as xr
def time_index_from_filenames(filenames):
'''helper function to create a pandas DatetimeIndex
Filename example: 20150520_0164.tif'''
return pd.DatetimeIndex([pd.Timestamp(f[:8]) for f in filenames])
filenames = glob.glob('*.tif')
time = xr.Variable('time', time_index_from_filenames(filenames))
chunks = {'x': 5490, 'y': 5490, 'band': 1}
da = xr.concat([xr.open_rasterio(f, chunks=chunks) for f in filenames], dim=time)
这篇关于将多个GeoTIFF图像的栅格时间序列转换为NetCDF的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!