从现有NetCDF文件创建新的NetCDF,同时保留原始文件的压缩 [英] Creating a new NetCDF from existing NetCDF file while preserving the compression of the original file

查看:326
本文介绍了从现有NetCDF文件创建新的NetCDF,同时保留原始文件的压缩的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从现有的NetCDF文件创建一个新的NetCDF文件.我只对使用177个变量列表中的12个变量感兴趣.您可以从此ftp站点上找到示例NetCDF文件. .

I am trying to create a new NetCDF file from an existing NetCDF file. I am only interested in using 12 variables from a list of 177 variables. You can find the sample NetCDF file from this ftp site here.

我使用了先前的SO答案中的以下代码.您可以找到它这里.

I used the following code from a previous SO answer. You can find it here.

import netCDF4 as nc

file1 = '/media/sf_jason2/cycle_001/JA2_GPN_2PdP001_140_20080717_113355_20080717_123008.nc'
file2 = '/home/sandbox/test.nc'

toinclude = ['lat_20hz', 'lon_20hz', 'time_20hz', 'alt_20hz', 'ice_range_20hz_ku', 'ice_qual_flag_20hz_ku', 'model_dry_tropo_corr', 'model_wet_tropo_corr', 'iono_corr_gim_ku', 'solid_earth_tide', 'pole_tide', 'alt_state_flag_ku_band_status']

with nc.Dataset(file1) as src, nc.Dataset(file2, "w") as dst:
    # copy attributes
    for name in src.ncattrs():
        dst.setncattr(name, src.getncattr(name))
    # copy dimensions
    for name, dimension in src.dimensions.iteritems():
        dst.createDimension(
        name, (len(dimension) if not dimension.isunlimited else None))
    # copy all file data for variables that are included in the toinclude list
    for name, variable in src.variables.iteritems():
        if name in toinclude:
            x = dst.createVariable(name, variable.datatype, variable.dimensions)
            dst.variables[name][:] = src.variables[name][:]

我遇到的问题是原始文件只有5.3 MB,但是当我在新文件大小上复制新变量时,它的大小约为17 MB.剥离变量的全部目的是减小文件大小,但是最后我得到了更大的文件大小.

The issue that I am having is that the original file is only 5.3 MB, however when I copy the new variables over the new file size is around 17 MB. The whole point of stripping the variables is to decrease the file size, but I am ending up with a larger file size.

我也尝试过使用xarray.但是,当我尝试合并多个变量时遇到问题.以下是我尝试在xarray中实现的代码.

I have tried using xarray as well. But I am having issues when I am trying to merge multiple variables. The following is the code that I am trying to implement in xarray.

import xarray as xr

fName = '/media/sf_jason2/cycle_001/JA2_GPN_2PdP001_140_20080717_113355_20080717_123008.nc'
file2 = '/home/sandbox/test.nc'
toinclude = ['lat_20hz', 'lon_20hz', 'time_20hz', 'alt_20hz', 'ice_range_20hz_ku', 'ice_qual_flag_20hz_ku', 'model_dry_tropo_corr', 'model_wet_tropo_corr', 'iono_corr_gim_ku', 'solid_earth_tide', 'pole_tide', 'alt_state_flag_ku_band_status']

ds = xr.open_dataset(fName)
newds = xr.Dataset()
newds['lat_20hz'] = ds['lat_20hz']
newds.to_netcdf(file2)

如果我要复制一个变量,则Xarray可以正常工作,但是,当我尝试将多个变量复制到空数据集时,它会出现问题.我找不到使用xarray复制多个变量的任何好例子.无论哪种方式,我都能很好地实现此工作流程.

Xarray works fine if I am trying to copy over one variable, however, it's having issues when I am trying to copy multiple variables to an empty dataset. I couldn't find any good examples of copying multiple variables using xarray. I am fine achieving this workflow either way.

最终,如何减小使用netCDF4创建的新NetCDF的文件大小?如果那不理想,是否有办法在不合并问题的情况下将多个变量添加到xarray中的空数据集中?

Ultimately, How can I decrease the file size of the new NetCDF that is being created using netCDF4? If that's not ideal, is there a way to add multiple variables to an empty dataset in xarray without merging issues?

推荐答案

以下工作流程是否足够:

Would the following workflow suffice:

ds = xr.open_dataset(fName)
ds[toinclude].to_netcdf(file2)

由于您提到要减小文件大小,因此您应该在.您可能想要做类似的事情:

Since you mentioned trying to decrease the file size, you should take a look at Xarray's documentation on "writing encoded data". You may want to do something like:

encoding = {v: {'zlib: True, 'complevel': 4} for v in toinclude}
ds[toinclude].to_netcdf(file2, encoding=encoding, engine='netcdf4')

这篇关于从现有NetCDF文件创建新的NetCDF,同时保留原始文件的压缩的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆