使用dask进行3D体积处理 [英] 3D volume processing using dask

查看:88
本文介绍了使用dask进行3D体积处理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我现在正在使用dask探索一些带有简单模具的3D交互式体积卷积。
让我解释一下我的意思:

I’m exploring 3D interactive volume convolution with some simple stencils using dask right now. Let me explain what I mean:


  • 假设您要通过Sobel Transform处理3D数据(例如,获得L1或L2梯度。

  • 然后将输入的3D图像划分为子体积(具有一些重叠边界–对于3x3x3的模板Sobel,它将需要+2个样本重叠/填充)

  • 现在让我们假设您在整个3D体积上创建了Sobel 3D变换的延迟计算,但尚未执行。

现在最重要的部分:


  • 我想编写一个提取某些特定2D的函数

  • 然后最后让所有内容进行计算:


    • 但是我需要什么要做的不是为我计算整个变换,然后提供一个部分。


      • 我需要它仅执行那些计算特定2D转换图像切片所需的任务。

      您认为吗?有可能吗?

      为了用图像解释它-请认为这是3D域分解(这是DWT提供的-但很好地用于说明从这里):

      In order to explain it with image – please consider this to be a 3D domain decomposition (this is from DWT – but good for illustration from here):

      域分解的弊端

      并假设有一个函数可以使用dask计算整个体积的3D变换。
      但是,例如,我想得到的是转换后的3D数据的2D图像,该图像由LLL1,LLH1,HLH1,HLL1平面组成(基本上是单个切片)。

      And assume that there is a function which computes 3D transform of the entire volume using dask. But what I would like to get – for example – is 2D image of the transformed 3D data which consists from LLL1,LLH1,HLH1,HLL1 planes (essentially a single slice).

      重要的部分不是计算整个子多维数据集,而是让dask以某种方式自动跟踪计算图中的依赖项并

      The important part is not to compute the whole subcubes – but let dask somehow automatically track the dependencies in the compute graph and evaluate only those.

      请不要担心计算与复制时间。
      假设它具有完美的比率。

      Please don’t worry about compute v.s. copy time. Assume that it has perfect ratio.

      让我知道是否需要进一步澄清!
      感谢您的帮助!

      Let me know if more clarification is needed! Thanks for your help!

      推荐答案

      我听到了几个问题。我将分别回答每个问题

      I'm hearing a few questions. I'll answer each individually


      • Dask能否跟踪输出子集所需的任务,而仅计算这些任务?
      • Can Dask track which tasks are required for a subset of outputs and only compute those?

      是。懒惰的Dask操作会生成一个依赖图。在dask.arrays的情况下,此图按块进行。如果您的输出仅取决于图形的子集,则Dask将删除不必要的任务。有关此问题的深入文档,请此处特别是优化。

      Yes. Lazy Dask operations produce a dependency graph. In the case of dask.arrays this graph is per-chunk. If your output only depends on a subset of the graph then Dask will remove tasks that are not necessary. The in-depth docs for this are here and the cull optimization in particular.

      例如,考虑这100,000 x 100,000个数组

      As an example consider this 100,000 by 100,000 array

      >>> x = da.random.random((100000, 100000), chunks=(1000, 1000))
      

      并且可以说我从中添加了几个一维切片

      And lets say that I add a couple of 1d slices from it

      >>> y = x[5000, :] + x[:, 5000].T
      

      所得的优化图仅足以计算输出

      The resulting optimized graph is only large enough to compute the output

      >>> graph = y._optimize(y.dask, y._keys())  # you don't need to do this
      >>> len(graph)                              # it happens automatically
      301
      

      我们可以很快计算出结果:

      And we can compute the result quite quickly:

      In [8]: %time y.compute()
      CPU times: user 3.18 s, sys: 120 ms, total: 3.3 s
      Wall time: 936 ms
      Out[8]: 
      array([ 1.59069994,  0.84731881,  1.86923216, ...,  0.45040813,
              0.86290539,  0.91143427])
      

      现在,这并不完美。它确实必须产生我们两个切片接触的所有1000x1000块。

      Now, this wasn't perfect. It did have to produce all of the 1000x1000 chunks that our two slices touched. But you can control the granularity there.

      简短的答案:Dask会自动检查图形,并仅运行评估结果所需​​的那些任务。输出。

      Short answer: Dask will automatically inspect the graph and only run those tasks that are necessary to evaluate the output. You don't need to do anything special to do this.


      • 用dask进行重叠数组计算是一个好主意吗? .array?

      • Is it a good idea to do overlapping array computations with dask.array?

      也许。相关文档页面位于重叠有鬼单元的块。 Dask.array具有便利功能,可以很容易地写下来。但是,它将创建内存中的副本。您所处位置的许多人都觉得记忆复制太慢。 Dask通常不支持就地计算,因此我们无法像适当的MPI代码一样高效。不过,我将性能问题留给您。

      Maybe. The relevant doc page is here on Overlapping Blocks with Ghost Cells. Dask.array has convenience functions to make this easy to write down. However it will create in-memory copies. Many people in your position find memcopy too slow. Dask generally doesn't support in-place computation so we can't be as efficient as proper MPI code. I'll leave the performance question here to you though.

      这篇关于使用dask进行3D体积处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆