有没有一种更快的方法来汇总Xarray数据集变量? [英] Is there a faster way to sum Xarray dataset variables?

查看:69
本文介绍了有没有一种更快的方法来汇总Xarray数据集变量?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我有史以来第一个堆栈交换问题,所以我希望我做得正确.

This is my first ever stack exchange question, so I hope I'm doing this correctly.

我正在尝试将数据集中的一些xarray变量求和.每个变量具有相同的尺寸.代码本质上是这样的:

I am trying to sum together a few xarray variables in a dataset. Each variable has the same dimensions. The code looks essentially like this:

def add_variables(xarray_dataset, listofvars):
    data = 0
    for var in listofvars:
        data = data + dset[var][:,-1,:] # slice of each variable
    return data 

summed_variables = add_variables(dset, ['varname1, varname2'])

但是,这需要永远运行.有人建议更快的方式进行此操作吗?谢谢!

However, this takes forever to run. Does anyone have a suggestion for a faster way to go about this? Thank you!

推荐答案

您可以使用

You can use the to_array method to stack the variables along a new dimension (which is by default named "variable") and then take the sum over this dimension. You can select variables and slice them beforehand if necessary.

import numpy as np
import xarray as xr

# Create dummy dataset
ds = xr.Dataset(
    {var: (("x", "y", "z"), np.random.rand(5, 3, 2)) for var in "abcde"}
)

# Sum over (a slice of some of the) variables
vars_to_sum = ["a", "c", "d"]
summed_variables = ds[vars_to_sum].isel(y=-1).to_array().sum("variable")

我认为这比自定义函数要容易得多,尽管在我的比较中并没有更快 1 :

I think that this is a lot easier than your custom function although it is not faster in my comparison1:

%timeit add_variables(ds, vars_to_sum)
464 µs ± 591 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit ds[vars_to_sum].isel(y=-1).to_array().sum("variable")
660 µs ± 12.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

但是,对于这个小的数据集,它们都非常快,因此差异并不明显.我不知道您的数据集是什么样子–如果您可以共享有关数据的更多信息以诊断性能问题,这可能会有所帮助.

However, for this small dataset, both of them are pretty fast so the difference is not noticeable. I don't know what your dataset looks like – it would probably help if you could share some more information about the data in order to diagnose performance issues.

1 请注意,我必须稍微更改一下函数才能使其运行–函数标题和正文中的数据集名称不一致:

1 Note that I had to change your function a little bit to make it run – the name of the dataset in the function header and body were not consistent:

def add_variables(xarray_dataset, listofvars):
    data = 0
    for var in listofvars:
        # changed dset to xarray_dataset in the following line
        data = data + xarray_dataset[var][:,-1,:]
    return data 

这篇关于有没有一种更快的方法来汇总Xarray数据集变量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆