R从.CSV创建NetCDF [英] R create NetCDF from .CSV

查看:113
本文介绍了R从.CSV创建NetCDF的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从.csv文件创建NetCDF.我已经在这里和其他地方阅读了一些教程,仍然有一些疑问.

I am trying to create a NetCDF from a .csv file. I have read several tutorials here and other places and still have some doubts.

我有一个根据的表格:

lat,long,time,rh,temp
41,-109,6,1,1
40,-107,18,2,2
39,-105,6,3,3
41,-103,18,4,4
40,-109,6,5,2
39,-107,18,6,4

我使用R中的ncdf4包创建NetCDF.

I create the NetCDF using the ncdf4 package in R.

xvals <- data$lon
yvals <- data$lat 
nx <- length(xvals)
ny <- length(yvals)
lon1 <- ncdim_def("longitude", "degrees_east", xvals)
lat2 <- ncdim_def("latitude", "degrees_north", yvals)
time <- data$time
mv <- -999 #missing value to use

var_temp <- ncvar_def("temperatura", "celsius", list(lon1, lat2, time), longname="Temp. da superfície", mv) 

var_rh <- ncvar_def("humidade", "%", list(lon1, lat2, time), longname = "humidade relativa", mv )

ncnew <- nc_create(filename, list(var_temp, var_rh))
ncvar_put(ncnew, var_temp, dadostemp, start=c(1,1,1), count=c(nx,ny,nt))

当我按照该程序进行操作时,它指出NC期望的数据量是我的3倍. 我知道为什么每个维度都有一个矩阵,因为我说过这些变量是根据经度,纬度和时间来确定的.

When I follow the procedure it states that the NC expects 3 times the number of data that I have. I understand why, one matrix for each dimension, since I stated that the variables are according to the Longitude, Latitude and Time.

那么,在每次数据采集中已经有了一个Lon,Lat,Time和其他变量的情况下,我将如何导入此类数据?

So, how would I import this kind of data, where I already have one Lon, Lat, Time and other variables for each data acquisition?

有人可以给我一些启示吗?

Could someone shed some light?

PS:这里使用的数据不是我的真实数据,只是我在教程中使用的一些示例.

PS: The data used here is not my real data, just some example I was using for the tutorials.

推荐答案

我认为您的代码中存在多个问题.逐步:

I think there is more than one problem in your code. Step by step:

创建尺寸

在nc文件中,维度不能用作键值,只有一个向量值定义了变量数组中每个位置的含义. 这意味着您应该这样创建尺寸:

In a nc file dimensions don't work as key-values there just a vector of values defining what each position in a variable array means. This means you should create your dimensions like this:

xvals <- unique(data$lon)
xvals <- xvals[order(xvals)]
yvals <- yvals[order(unique(data$lat))] 
lon1 <- ncdim_def("longitude", "degrees_east", xvals)
lat2 <- ncdim_def("latitude", "degrees_north", yvals)
time <- data$time
time_d <- ncdim_def("time","h",unique(time))

在我工作的地方,我们将无限制的维度用作索引,而与该维度同名的1d变量保存这些值.我不确定R中无限制的尺寸如何工作.由于您不要求它,所以我将其省略了:-)

Where I work we use unlimited dimensions as mere indexes while a 1d-variable with same name as the dimension holds the values. I'm not sure how unlimited dimensions work in R. Since you don't ask for it I leave this out :-)

定义变量

mv <- -999 #missing value to use
var_temp <- ncvar_def("temperatura", "celsius", 
                      list(lon1, lat2, time_d), 
                      longname="Temp. da superfície", mv) 
var_rh <- ncvar_def("humidade", "%", 
                     list(lon1, lat2, time_d), 
                     longname = "humidade relativa", mv )

添加数据

创建一个nc文件:ncnew <- nc_create(f, list(var_temp, var_rh))

在添加值时,将保存数据的对象熔化为1d数组,并在start指定的位置处开始顺序写入.要写入的维数由计数值控制.如果您有这样的数据:

When adding values the object holding the data is molten to a 1d-array and a sequential write is started at the position specified by start. The dimension to write along is controlled by the values in count. If you have data like this:

long, lat, time, t
   1,   1,    1, 1
   2,   1,    1, 2
   1,   2,    1, 3
   2,   2,    1, 4

命令ncvar_put(ncnew, var_temp,data$t,count=c(2,2,1))会给您(可能)期望的结果.

The command ncvar_put(ncnew, var_temp,data$t,count=c(2,2,1)) would give you what you (probably) expect.

对于您来说,数据的第一步是为尺寸创建索引:

For you're data the first step is to create the indexes for the dimensions:

data$idx_lon <- match(data$long,xvals)
data$idx_lat <- match(data$lat,yvals)
data$idx_time <- match(data$time,unique(time))

然后创建一个尺寸适合您数据的数组:

Then create an array with the dimensions appropriate for your data:

m <- array(mv,dim = c(length(yvals),length(xvals),length(unique(time))))

然后使用您的值填充数组:

Then fill the array with you're values:

for(i in 1:NROW(data)){
  m[data$idx_lat[i],data$idx_lon[i],data$idx_time[i]] <- data$temp[i]
}

如果需要考虑速度,则可以计算矢量化的线性索引,并将其用于值分配.

if speed is a concern you could calculate the linear index vectorised and use this for value assignment.

写数据

ncvar_put(ncnew, var_temp,m)

请注意,您不需要startcount.

Note that you don't need start and count.

最后关闭nc文件,将数据写入磁盘nc_close(ncnew) (可选)我建议您使用ncdump控制台命令来检查文件.

Finally close the nc file to write data to the disk nc_close(ncnew) Optionally I would recommend you the ncdump console command to check your file.

修改

关于您要编写完整的数组还是使用startcount的问题,我相信这两种方法都可以可靠地工作.首选哪一个取决于您的数据和您的个人喜好.

Regarding your question whether to write a complete array or use start and count I believe both methods work reliable. Which one to prefer depends on your data and you're personal preferences.

我认为构建数组,添加值然后将其整体写入的方法更容易理解.但是,在询问哪种方法更有效时,取决于数据.如果您的数据量很大,并且具有许多NA值,我相信使用多个具有start和count的写入操作可能会更快.如果不常见,则创建一个矩阵并执行一次写入会更快.如果您的数据量很大,那么创建一个额外的数组将超出您的可用内存,则必须将这两种方法结合起来.

I think the method of building an array, add the values and then write it as whole is easier to understand. However, when asking what is more efficient it depends on the data. If you're data is big and has many NA values I believe using multiple writes with start and count could be faster. If NA's are rare creating one matrix and do single write would be faster. If you're data is so big creating an extra array would exceed you're available memory you have to combine both methods.

这篇关于R从.CSV创建NetCDF的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆