使用可变长度DataArray索引XArray数据 [英] Indexing xarray data with variable length DataArray

查看:105
本文介绍了使用可变长度DataArray索引XArray数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用DataArray索引从xarray数据集中提取数据.我的目标是沿重叠数组的不同线段获取数据.为此,我获得了每条线的索引(根据长度,它们的大小不同).

I am trying to extract data from xarray dataset using DataArray indexing. My goal is to obtain the data along different line segments overlapping the array. For that I have obtained indices of each of the lines (these are of different sizes based on the length).

例如对于第1行:x = [1,2,3], y=[7,8,9],类似地,对于第2行是x=[1,4,5,6,8], y=[0,2,7,9,6],依此类推,我有一些行是100x2.为此,我尝试如下:

For example for line 1 : x = [1,2,3], y=[7,8,9] and similarly for line 2 is x=[1,4,5,6,8], y=[0,2,7,9,6] and so on I have some of the lines which are 100x 2. For this I have tried like below :

df=xarray_dataset
indx=xr.DataArray([[1,2,3],[1,4,5,6,8],[2,3]])
indy=xr.DataArray([[7,9,8],[0,2,7,9,6],[4,5]])
dx_sel=df.isel(x=indx,y=indy)

不过,据我了解,每个数据数组索引的长度都必须相等.有没有办法我可以处理此类问题.基本上,这些索引代表数据帧内不同段的x和y坐标,并获取每个段的平均值.如果只有很少的段数,我将有100个这样的段,我将能够为每个段使用循环索引,但是对每个段使用循环在计算上不是很有效.

However what I understand that the length of each of the data array index needs to be equal. Is there a way I can handle such issues. Basically these indices represent the x and y coordinates of different segments within the data frame and get the mean of each of the segment, I have 100s of such segments if there are only few I would be able to use a loop for each of the segment indexes however it's not computationally efficient to use a loop for each segment.

这也是numpy数组的类似问题.有没有办法在索引中传递NaN或类似的东西,以便我们可以形成相同的形状,但是没有为该索引提取数据.

This is a similar issue with numpy array as well. Is there a way to pass NaN or something similar in the index so that we could make the equal shape but no data is extracted for that index.

推荐答案

您可以使用set_index-> unstack机制,它基于pd.MultiIndex.

You can use set_index -> unstack mechanism, which is based on pd.MultiIndex.

In [4]: df = xr.DataArray(np.arange(110).reshape(10, 11),  
   ...:                   dims=['x', 'y'])  
In [5]: indx=xr.DataArray([1,2,3, 1,4,5,6,8, 2,3], 
   ...:                   dims=['index'],  
   ...:                   coords={'i': ('index', [0,0,0, 1,1,1,1,1, 2,2]), 
   ...:                           'j': ('index', [0,1,2, 0,1,2,3,4, 0,1])}) 
   ...:  
   ...: indy=xr.DataArray([7,9,8, 0,2,7,9,6, 4,5], dims=['index'], 
   ...:                   coords={'i': ('index', [0,0,0, 1,1,1,1,1, 2,2]), 
   ...:                           'j': ('index', [0,1,2, 0,1,2,3,4, 0,1])})       

In [8]: df.isel(x=indx, y=indy).set_index(index=['i', 'j']).unstack('index')                                         
Out[8]: 
<xarray.DataArray (i: 3, j: 5)>
array([[18., 31., 41., nan, nan],
       [11., 46., 62., 75., 94.],
       [26., 38., nan, nan, nan]])
Coordinates:
  * i        (i) int64 0 1 2
  * j        (j) int64 0 1 2 3 4

在这里,indxindy具有无量纲坐标ij,它们实质上是索引在二维空间中的原始位置.

Here, indx and indy has non-dimensional coordinates, i and j, which are essentially the original position of the index in the 2-dimensional space.

这篇关于使用可变长度DataArray索引XArray数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆