使用可变长度DataArray索引XArray数据 [英] Indexing xarray data with variable length DataArray
问题描述
我正在尝试使用DataArray索引从xarray数据集中提取数据.我的目标是沿重叠数组的不同线段获取数据.为此,我获得了每条线的索引(根据长度,它们的大小不同).
I am trying to extract data from xarray dataset using DataArray indexing. My goal is to obtain the data along different line segments overlapping the array. For that I have obtained indices of each of the lines (these are of different sizes based on the length).
例如对于第1行:x = [1,2,3], y=[7,8,9]
,类似地,对于第2行是x=[1,4,5,6,8], y=[0,2,7,9,6]
,依此类推,我有一些行是100x2.为此,我尝试如下:
For example for line 1 : x = [1,2,3], y=[7,8,9]
and similarly for line 2 is x=[1,4,5,6,8], y=[0,2,7,9,6]
and so on I have some of the lines which are 100x 2. For this I have tried like below :
df=xarray_dataset
indx=xr.DataArray([[1,2,3],[1,4,5,6,8],[2,3]])
indy=xr.DataArray([[7,9,8],[0,2,7,9,6],[4,5]])
dx_sel=df.isel(x=indx,y=indy)
不过,据我了解,每个数据数组索引的长度都必须相等.有没有办法我可以处理此类问题.基本上,这些索引代表数据帧内不同段的x和y坐标,并获取每个段的平均值.如果只有很少的段数,我将有100个这样的段,我将能够为每个段使用循环索引,但是对每个段使用循环在计算上不是很有效.
However what I understand that the length of each of the data array index needs to be equal. Is there a way I can handle such issues. Basically these indices represent the x and y coordinates of different segments within the data frame and get the mean of each of the segment, I have 100s of such segments if there are only few I would be able to use a loop for each of the segment indexes however it's not computationally efficient to use a loop for each segment.
这也是numpy数组的类似问题.有没有办法在索引中传递NaN或类似的东西,以便我们可以形成相同的形状,但是没有为该索引提取数据.
This is a similar issue with numpy array as well. Is there a way to pass NaN or something similar in the index so that we could make the equal shape but no data is extracted for that index.
推荐答案
您可以使用set_index
-> unstack
机制,它基于pd.MultiIndex
.
You can use set_index
-> unstack
mechanism, which is based on pd.MultiIndex
.
In [4]: df = xr.DataArray(np.arange(110).reshape(10, 11),
...: dims=['x', 'y'])
In [5]: indx=xr.DataArray([1,2,3, 1,4,5,6,8, 2,3],
...: dims=['index'],
...: coords={'i': ('index', [0,0,0, 1,1,1,1,1, 2,2]),
...: 'j': ('index', [0,1,2, 0,1,2,3,4, 0,1])})
...:
...: indy=xr.DataArray([7,9,8, 0,2,7,9,6, 4,5], dims=['index'],
...: coords={'i': ('index', [0,0,0, 1,1,1,1,1, 2,2]),
...: 'j': ('index', [0,1,2, 0,1,2,3,4, 0,1])})
In [8]: df.isel(x=indx, y=indy).set_index(index=['i', 'j']).unstack('index')
Out[8]:
<xarray.DataArray (i: 3, j: 5)>
array([[18., 31., 41., nan, nan],
[11., 46., 62., 75., 94.],
[26., 38., nan, nan, nan]])
Coordinates:
* i (i) int64 0 1 2
* j (j) int64 0 1 2 3 4
在这里,indx
和indy
具有无量纲坐标i
和j
,它们实质上是索引在二维空间中的原始位置.
Here, indx
and indy
has non-dimensional coordinates, i
and j
, which are essentially the original position of the index in the 2-dimensional space.
这篇关于使用可变长度DataArray索引XArray数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!