向xarray数据集添加“常量"维 [英] Add 'constant' dimension to xarray Dataset

查看:267
本文介绍了向xarray数据集添加“常量"维的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一系列CSV格式的每月网格数据集.我想阅读它们,添加一些尺寸,然后写入netcdf.过去,我在使用xarray(xray)方面有丰富的经验,所以我认为如果要使用xarray(xray)可以完成此任务.

I have a series of monthly gridded datasets in CSV form. I want to read them, add a few dimensions, and then write to netcdf. I've had great experience using xarray (xray) in the past so thought I'd use if for this task.

我可以轻松地将它们放入2D DataArray中,如下所示:

I can easily get them into a 2D DataArray with something like:

data = np.ones((360,720))
lats = np.arange(-89.75, 90, 0.5) * -1
lngs = np.arange(-179.75, 180, 0.5)
coords =  {'lat': lats, 'lng':lngs}
da = xr.DataArray(data, coords=coords)

但是,当我尝试添加另一个维度来传达有关时间的信息(所有数据都来自同一年/月)时,情况开始变糟.

But when I try to add another dimension, which would convey information about time (all data is from the same year/month), things start to go sour.

我尝试了两种破解方法:

I've tried two ways to crack this:

1)将我的输入数据扩展为m x n x 1,类似于:

1) expand my input data to m x n x 1, something like:

data = np.ones((360,720))
lats = np.arange(-89.75, 90, 0.5) * -1
lngs = np.arange(-179.75, 180, 0.5)
coords =  {'lat': lats, 'lng':lngs}
data = data[:,:,np.newaxis]

然后,我按照与上述相同的步骤进行操作,将坐标更新为包含第三维.

Then I follow the same steps as above, with coords updated to contain a third dimension.

lats = np.arange(-89.75, 90, 0.5) * -1
lngs = np.arange(-179.75, 180, 0.5)
coords =  {'lat': lats, 'lng':lngs}
coords['time'] = pd.datetime(year, month, day))
da = xr.DataArray(data, coords=coords)
da.to_dataset(name='variable_name')

这对于创建DataArray很好-但是当我尝试转换为数据集(以便可以写入netCDF)时,出现有关"ValueError:坐标对象必须为一维"的错误

This is fine for creating a DataArray -- but when I try to convert to a dataset (so I can write to netCDF), I get an error about 'ValueError: Coordinate objects must be 1-dimensional'

2)我尝试的第二种方法是获取数据数组,将其转换为数据框,将索引设置为['lat','lng','time'],然后使用.我已经尝试过-但是要花20分钟以上才能杀死进程.

2) The second approach I've tried is taking my dataarray, casting it to a dataframe, setting the index to ['lat','lng', 'time'] and then going back to a dataset with xr.Dataset.from_dataframe(). I've tried this -- but it takes 20+ min before I kill the process.

有人知道我如何获得具有每月时间"维度的数据集吗?

Does anyone know how I can get a Dataset with a monthly 'time' dimension?

推荐答案

您的第一个示例非常接近:

Your first example is pretty close:

lats = np.arange(-89.75, 90, 0.5) * -1
lngs = np.arange(-179.75, 180, 0.5)
coords =  {'lat': lats, 'lng': lngs}
coords['time'] = [datetime.datetime(year, month, day)]
da = xr.DataArray(data, coords=coords, dims=['lat', 'lng', 'time'])
da.to_dataset(name='variable_name')

您会注意到我的版本中有一些更改:

You'll notice a few changes in my version:

  1. 我要首先传递时间"坐标,而不是标量.您需要传递一个列表或一维数组以获取一维坐标变量,如果您还使用时间"作为维,则需要此变量.这就是错误ValueError: Coordinate objects must be 1-dimensional试图告诉您的内容(顺便说一句-如果您有关于使该错误消息更有用的想法,我非常高兴!).
  2. 我正在为DataArray构造函数提供一个dims参数.传递(无序)字典有点危险,因为不能保证迭代顺序.
  3. 我也切换到了datetime.datetime而不是pd.datetime.后者只是前者的别名.
  1. I'm passing in a first for the 'time' coordinate instead of a scalar. You need to pass in a list or 1d array to get a 1D coordinate variable, which is what you need if you also use 'time' as a dimension. That's what the error ValueError: Coordinate objects must be 1-dimensional is trying to tell you (by the way -- if you have ideas for how to make that error message more helpful, I'm all ears!).
  2. I'm providing a dims argument to the DataArray constructor. Passing in a (non-ordered) dictionary is a little dangerous because the iteration order is not guaranteed.
  3. I also switched to datetime.datetime instead of pd.datetime. The later is simply an alias for the former.


另一种明智的方法是,在将时间"作为标量坐标(例如,


Another sensible approach is to use concat with a list of one item once you've added 'time' as a scalar coordinate, e.g.,

lats = np.arange(-89.75, 90, 0.5) * -1
lngs = np.arange(-179.75, 180, 0.5)
coords =  {'lat': lats, 'lng': lngs, 'time': datetime.datetime(year, month, day)}
da = xr.DataArray(data, coords=coords, dims=['lat', 'lng'])
expanded_da = xr.concat([da], 'time')

此版本很好地概括了将几天之内的数据连接在一起的过程-您只需使DataArrays列表更长即可.以我的经验,大多数时候,您首先要具有额外的尺寸的原因是能够与之保持一致.否则长度1尺寸不是很有用.

This version generalizes nicely to joining together data from a bunch of days -- you simply make the list of DataArrays longer. In my experience, most of the time the reason why you want the extra dimension in the first place is to be able to able to concat along it. Length 1 dimensions are not very useful otherwise.

这篇关于向xarray数据集添加“常量"维的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆