使用Xarray绘制2D数据要花费惊人的时间吗? [英] Plotting 2D data using Xarray takes a surprisingly long time?

查看:271
本文介绍了使用Xarray绘制2D数据要花费惊人的时间吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用xarray读取NetCDF文件.每个变量都有4个维度(时间,lev,y,x ).读取变量后,我将计算变量 QVAPOR 平均值 ,以及( Times,lev )尺寸.计算后,我得到变量 QVAPOR_mean ,它是一个二维形状的变量( y:699,x:639 )

I am reading NetCDF files using xarray. Each variable have 4 dimensions (Times, lev, y, x). After reading the variable, I calculate the mean of the variable QVAPOR along (Times,lev) dimensions. After calculation I get variable QVAPOR_mean which is a 2D variable with shape (y: 699, x: 639).

Xarray仅用了10微秒即可读取具有形状的数据(时间:2918,lev:36,y:699,x:639 );但花了不止 60分钟 以绘制形状数据的填充轮廓( y:699,x:639 ).

Xarray took only 10micro seconds to read the data with shape (Times:2918, lev:36, y:699, x:639); but took more than 60 minutes to plot the filled contour of the data of shape (y: 699, x: 639).

我想知道Xarray如何花费极长时间(超过60分钟)来绘制尺寸为( y:699,x:639 ).

I am wondering how come Xarray is taking extremely long time (more than 60 mins) to plot the contourf of array with size (y: 699, x: 639).

我使用以下代码读取文件并进行计算.

I use following code for reading the files and perform computation.

flnm=xr.open_mfdataset('./WRF_3D_2007_*.nc',chunks={'Times': 100})
QVAPOR_mean=flnm.QVAPOR.mean(dim=('Times','lev')
QVAPOR_mean.plot.imshow()

最后一条命令需要60多分钟才能完成.感谢您的帮助. 谢谢

The last command takes more than 60 mins to complete. Help is appreciated. Thank You

推荐答案

当您打开数据集并提供chunks参数时,xarray将返回由dask数组组成的Dataset.这些数组的评估是懒惰的"( xarray/dask文档).直到您绘制数据后,计算才被触发.为了说明这一点,您可以在取均值之后显式加载数据:

When you open your dataset and provide the chunks argument, xarray is returning a Dataset that is comprised of dask arrays. These arrays are evaluated "lazily" (xarray/dask documentation). It is not until you plot your data that the computation is triggered. To illustrate this, you can explicitly load your data after you take the mean:

flnm=xr.open_mfdataset('./WRF_3D_2007_*.nc',chunks={'Times': 100})
QVAPOR_mean=flnm.QVAPOR.mean(dim=('Times','lev').load()

现在,您的QVAPOR_mean变量由numpy数组而不是dask数组支持.绘制此数组可能会更快.

Now your QVAPOR_mean variable is backed by a numpy array instead of a dask array. Plotting this array will likely be much faster.

但是,mean的计算可能仍需要相当长的时间.这里也有提高吞吐量的方法.

However, the computation of your mean is likely to still take quite a long time. There are ways improve the throughput here as well.

  • 尝试使用更大的块大小.我经常发现10-100Mb范围内的块大小效果最好.

  • Try using a larger chunk size. I often find that chunk sizes in the 10-100Mb range perform best.

尝试其他调度程序.默认情况下,您使用dask的线程调度程序.由于netCDF/HDF的限制,因此不允许从磁盘并行读取.我们发现distributed调度程序对于这些应用程序非常有效.

Try a different scheduler. You are by default using dask's threaded scheduler. Because of limitations with netCDF/HDF, this does not allow for parallel reads from disk. We have been finding that the distributed scheduler works well for these applications.

这篇关于使用Xarray绘制2D数据要花费惊人的时间吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆