合并大量的netCDF文件 [英] Combining a large amount of netCDF files
问题描述
我有一个很大的netCDF(.nc)文件文件夹,每个文件都有相似的名称.数据文件包含时间,经度,纬度和月降水量的变量.目标是获得每个月X年内的平均月降水量.因此,最后我将获得12个值,分别代表每个纬度和经度X年的平均月降水量.多年以来,每个文件都在同一位置. 每个文件都以相同的名称开头,并以"date.sub.nc"结尾,例如:
I have a large folder of netCDF (.nc) files each one with a similar name. The data files contain variables of time, longitude, latitude, and monthly precipitation. The goal is to get the average monthly precipitation over X amount of years for each month. So in the end I would have 12 values representing the average monthly precipitation over X amount of years for each lat and long. Each file is the same location over many years. Each file starts with the same name and ends in a "date.sub.nc" for example:
'data1.somthing.somthing1.avg_2d_Ind_Nx.200109.SUB.nc'
'data1.somthing.somthing1.avg_2d_Ind_Nx.200509.SUB.nc'
'data2.somthing.somthing1.avg_2d_Ind_Nx.201104.SUB.nc'
'data2.somthing.somthing1.avg_2d_Ind_Nx.201004.SUB.nc'
'data2.somthing.somthing1.avg_2d_Ind_Nx.201003.SUB.nc'
'data2.somthing.somthing1.avg_2d_Ind_Nx.201103.SUB.nc'
'data1.somthing.somthing1.avg_2d_Ind_Nx.201203.SUB.nc'
结尾是YearMonth.SUB.nc 到目前为止,我有:
The ending is YearMonth.SUB.nc What I have so far is:
array=[]
f = nc.MFDataset('data*.nc')
precp = f.variables['prectot']
time = f.variables['time']
array = f.variables['time','longitude','latitude','prectot']
我收到一个KeyError :(时间",经度",纬度","prectot").有没有一种方法可以合并所有这些数据,以便我可以操纵它?
I get a KeyError: ('time', 'longitude', 'latitude', 'prectot'). Is there a way to combine all this data so I am able to manipulate it?
推荐答案
正如@CharlieZender所述,ncra
是此处的解决方法,我将提供有关将该函数集成到Python脚本中的更多详细信息. (PS-您可以使用Homebrew轻松安装NCO,例如 http://alejandrosoto.net/blog/2014/01/22/setting-up-my-mac-for-scientific-research/)
As @CharlieZender mentioned, ncra
is the way to go here and I'll provide some more details on integrating that function into a Python script. (PS - you can install NCO easily with Homebrew, e.g. http://alejandrosoto.net/blog/2014/01/22/setting-up-my-mac-for-scientific-research/)
import subprocess
import netCDF4
import glob
import numpy as np
for month in range(1,13):
# Gather all the files for this month
month_files = glob.glob('/path/to/files/*{0:0>2d}.SUB.nc'.format(month))
# Using NCO functions ---------------
avg_file = './precip_avg_{0:0>2d}.nc'.format(month)
# Concatenate the files using ncrcat
subprocess.call(['ncrcat'] + month_files + ['-O', avg_file])
# Take the time (record) average using ncra
subprocess.call(['ncra', avg_file, '-O', avg_file])
# Read in the monthly precip climatology file and do whatever now
ncfile = netCDF4.Dataset(avg_file, 'r')
pr = ncfile.variables['prectot'][:,:,:]
....
# Using only Python -------------
# Initialize an array to store monthly-mean precip for all years
# let's presume we know the lat and lon dimensions (nlat, nlon)
nyears = len(month_files)
pr_arr = np.zeros([nyears,nlat,nlon], dtype='f4')
# Populate pr_arr with each file's monthly-mean precip
for idx, filename in enumerate(month_files):
ncfile = netCDF4.Dataset(filename, 'r')
pr = ncfile.variable['prectot'][:,:,:]
pr_arr[idx,:,:] = np.mean(pr, axis=0)
ncfile.close()
# Take the average along all years for a monthly climatology
pr_clim = np.mean(pr_arr, axis=0) # 2D now [lat,lon]
这篇关于合并大量的netCDF文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!