将多个 NetCDF 文件组合成时间序列多维数组 python [英] Combine multiple NetCDF files into timeseries multidimensional array python

查看:153
本文介绍了将多个 NetCDF 文件组合成时间序列多维数组 python的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用来自多个 netcdf 文件(在我计算机上的文件夹中)的数据.每个文件保存整个美国的数据,时间为 5 年.根据 x 和 y 坐标的索引引用位置.我正在尝试为多个位置(网格单元)创建时间序列,将 5 年期间编译为 20 年期间(这将合并 4 个文件).现在我能够从一个位置的所有文件中提取数据,并使用 numpy append 将其编译成一个数组.但是,我想提取多个位置的数据,将其放入一个矩阵中,其中行是位置,列包含时间序列降水数据.我想我必须创建一个列表或字典,但我不确定如何在循环中将数据分配给列表/字典.

I am using data from multiple netcdf files (in a folder on my computer). Each file holds data for the entire USA, for a time period of 5 years. Locations are referenced based on the index of an x and y coordinate. I am trying to create a time series for multiple locations(grid cells), compiling the 5 year periods into a 20 year period (this would be combining 4 files). Right now I am able to extract the data from all files for one location and compile this into an array using numpy append. However, I would like to extract the data for multiple locations, placing this into a matrix where the rows are the locations and the columns contain the time series precipitation data. I think I have to create a list or dictionary, but I am not really sure how to allocate the data to the list/dictionary within a loop.

我是 python 和 netCDF 的新手,如果这是一个简单的解决方案,请原谅我.我一直在使用此代码作为指南,但还没有弄清楚如何根据我想做的事情对其进行格式化:Python 读取多个可变大小的 NetCDF 降雨文件

I am new to python and netCDF, so forgive me if this is an easy solution. I have been using this code as a guide, but haven't figured out how to format it for what I'd like to do: Python Reading Multiple NetCDF Rainfall files of variable size

这是我的代码:

import glob
from netCDF4 import Dataset
import numpy as np

# Define x & y index for grid cell of interest 
    # Pittsburgh is 37,89
yindex = 37  #first number
xindex = 89  #second number

# Path
path = '/Users/LMC/Research Data/NARCCAP/'  
folder = 'MM5I_ccsm/'

## load data file names    
all_files = glob.glob(path + folder+'*.nc')
all_files.sort()

## initialize np arrays of timeperiods and locations
yindexlist = [yindex,'38','39'] # y indices for all grid cells of interest
xindexlist = [xindex,xindex,xindex] # x indices for all grid cells of interest
ngridcell = len(yindexlist)
ntimestep = 58400  # This is for 4 files of 14600 timesteps

## Initialize np array
timeseries_per_gridcell = np.empty(0)

## START LOOP FOR FILE IMPORT
for timestep, datafile in enumerate(all_files):    
    fh = Dataset(datafile,mode='r')  
    days = fh.variables['time'][:]
    lons = fh.variables['lon'][:]
    lats = fh.variables['lat'][:]
    precip = fh.variables['pr'][:]

    for i in range(1):
        timeseries_per_gridcell = np.append(timeseries_per_gridcell,precip[:,yindexlist[i],xindexlist[i]]*10800)

    fh.close()

print timeseries_per_gridcell     

我将 3 个文件放在 Dropbox 上,以便您可以访问它们,但我只能发布 2 个链接.它们是:

I put 3 files on dropbox so you could access them, but I am only allowed to post 2 links. Here are they are:

https://www.dropbox.com/s/rso0hce8bq7yi2h/pr_MM5I_ccsm_2041010103.nc?dl=0https://www.dropbox.com/s/j56undjvv7iph0f/pr_MM5I_ccsm_2046010103.nc?dl=0"nc?dl=0

推荐答案

好的开始,我会推荐以下内容来帮助解决您的问题.

Nice start, I would recommend the following to help solve your issues.

首先,查看 ncrcat 以快速连接您的个人netCDF 文件转换为单个文件.我强烈建议下载 NCO 以进行 netCDF 操作,尤其是在这种情况下,它会在以后简化您的 Python 编码.

First, check out ncrcat to quickly concatenate your individual netCDF files into a single file. I highly recommend downloading NCO for netCDF manipulations, especially in this instance where it will ease your Python coding later on.

假设文件名为 precip_1.ncprecip_2.ncprecip_3.nc、precip_4.nc.您可以将它们沿记录维度连接起来,形成一个新的 precip_all.nc,记录维度长度为 58400 与

Let's say the files are named precip_1.nc, precip_2.nc, precip_3.nc, and precip_4.nc. You could concatenate them along the record dimension to form a new precip_all.nc with a record dimension of length 58400 with

ncrcat precip_1.nc precip_2.nc precip_3.nc precip_4.nc -O precip_all.nc

在 Python 中,我们现在只需要读入那个新的单个文件,然后提取并存储所需网格单元的时间序列.像这样:

In Python we now just need to read in that new single file and then extract and store the time series for the desired grid cells. Something like this:

import netCDF4
import numpy as np

yindexlist = [1,2,3]
xindexlist = [4,5,6]
ngridcell = len(xidx)
ntimestep = 58400

# Define an empty 2D array to store time series of precip for a set of grid cells
timeseries_per_grid_cell = np.zeros([ngridcell, ntimestep])

ncfile = netCDF4.Dataset('path/to/file/precip_all.nc', 'r')

# Note that precip is 3D, so need to read in all dimensions
precip = ncfile.variables['precip'][:,:,:]

for i in range(ngridcell):
     timeseries_per_grid_cell[i,:] = precip[:, yindexlist[i], xindexlist[i]]

ncfile.close()

如果您只需要使用 Python,则需要跟踪各个文件形成的时间索引块以制作完整的时间序列.58400/4 = 每个文件 14600 个时间步.因此,您将有另一个循环来读取每个单独的文件并存储相应的时间片段,即第一个文件将填充 0-14599、第二个 14600-29199 等.

If you have to use Python only, you'll need to keep track of the chunks of time indices that the individual files form to make the full time series. 58400/4 = 14600 time steps per file. So you'll have another loop to read in each individual file and store the corresponding slice of times, i.e. the first file will populate 0-14599, the second 14600-29199, etc.

这篇关于将多个 NetCDF 文件组合成时间序列多维数组 python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆