从现有NetCDF文件创建新的NetCDF,同时保留原始文件的压缩 [英] Creating a new NetCDF from existing NetCDF file while preserving the compression of the original file
问题描述
我正在尝试从现有的NetCDF文件创建一个新的NetCDF文件.我只对使用177个变量列表中的12个变量感兴趣.您可以从此ftp站点上找到示例NetCDF文件. .
I am trying to create a new NetCDF file from an existing NetCDF file. I am only interested in using 12 variables from a list of 177 variables. You can find the sample NetCDF file from this ftp site here.
我使用了先前的SO答案中的以下代码.您可以找到它这里.
I used the following code from a previous SO answer. You can find it here.
import netCDF4 as nc
file1 = '/media/sf_jason2/cycle_001/JA2_GPN_2PdP001_140_20080717_113355_20080717_123008.nc'
file2 = '/home/sandbox/test.nc'
toinclude = ['lat_20hz', 'lon_20hz', 'time_20hz', 'alt_20hz', 'ice_range_20hz_ku', 'ice_qual_flag_20hz_ku', 'model_dry_tropo_corr', 'model_wet_tropo_corr', 'iono_corr_gim_ku', 'solid_earth_tide', 'pole_tide', 'alt_state_flag_ku_band_status']
with nc.Dataset(file1) as src, nc.Dataset(file2, "w") as dst:
# copy attributes
for name in src.ncattrs():
dst.setncattr(name, src.getncattr(name))
# copy dimensions
for name, dimension in src.dimensions.iteritems():
dst.createDimension(
name, (len(dimension) if not dimension.isunlimited else None))
# copy all file data for variables that are included in the toinclude list
for name, variable in src.variables.iteritems():
if name in toinclude:
x = dst.createVariable(name, variable.datatype, variable.dimensions)
dst.variables[name][:] = src.variables[name][:]
我遇到的问题是原始文件只有5.3 MB,但是当我在新文件大小上复制新变量时,它的大小约为17 MB.剥离变量的全部目的是减小文件大小,但是最后我得到了更大的文件大小.
The issue that I am having is that the original file is only 5.3 MB, however when I copy the new variables over the new file size is around 17 MB. The whole point of stripping the variables is to decrease the file size, but I am ending up with a larger file size.
我也尝试过使用xarray.但是,当我尝试合并多个变量时遇到问题.以下是我尝试在xarray中实现的代码.
I have tried using xarray as well. But I am having issues when I am trying to merge multiple variables. The following is the code that I am trying to implement in xarray.
import xarray as xr
fName = '/media/sf_jason2/cycle_001/JA2_GPN_2PdP001_140_20080717_113355_20080717_123008.nc'
file2 = '/home/sandbox/test.nc'
toinclude = ['lat_20hz', 'lon_20hz', 'time_20hz', 'alt_20hz', 'ice_range_20hz_ku', 'ice_qual_flag_20hz_ku', 'model_dry_tropo_corr', 'model_wet_tropo_corr', 'iono_corr_gim_ku', 'solid_earth_tide', 'pole_tide', 'alt_state_flag_ku_band_status']
ds = xr.open_dataset(fName)
newds = xr.Dataset()
newds['lat_20hz'] = ds['lat_20hz']
newds.to_netcdf(file2)
如果我要复制一个变量,则Xarray可以正常工作,但是,当我尝试将多个变量复制到空数据集时,它会出现问题.我找不到使用xarray复制多个变量的任何好例子.无论哪种方式,我都能很好地实现此工作流程.
Xarray works fine if I am trying to copy over one variable, however, it's having issues when I am trying to copy multiple variables to an empty dataset. I couldn't find any good examples of copying multiple variables using xarray. I am fine achieving this workflow either way.
最终,如何减小使用netCDF4创建的新NetCDF的文件大小?如果那不理想,是否有办法在不合并问题的情况下将多个变量添加到xarray中的空数据集中?
Ultimately, How can I decrease the file size of the new NetCDF that is being created using netCDF4? If that's not ideal, is there a way to add multiple variables to an empty dataset in xarray without merging issues?
推荐答案
以下工作流程是否足够:
Would the following workflow suffice:
ds = xr.open_dataset(fName)
ds[toinclude].to_netcdf(file2)
由于您提到要减小文件大小,因此您应该在写入编码数据" .您可能想要做类似的事情:
Since you mentioned trying to decrease the file size, you should take a look at Xarray's documentation on "writing encoded data". You may want to do something like:
encoding = {v: {'zlib: True, 'complevel': 4} for v in toinclude}
ds[toinclude].to_netcdf(file2, encoding=encoding, engine='netcdf4')
这篇关于从现有NetCDF文件创建新的NetCDF,同时保留原始文件的压缩的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!