C ++数组Halide图像(和背面) [英] C++ array to Halide Image (and back)
问题描述
我开始使用Halide,虽然我掌握了它的设计的基本原则,但我正在努力处理有效地计划计算所需的细节(read:magic)。
我在一个使用Halide的MWE下面发布了一个将数组从一个位置复制到另一个位置。我假设这将编译下来只有一些指令,并采取小于1微秒运行。相反,它产生4000行的汇编,需要40毫秒运行!因此,在我的理解中,我有一个重要的洞。
- 在<$ c中包装现有数组的规范方式是什么$ c> Halide :: Image ?
- 如何计划函数
copy
$ p>#include< Halide.h>
使用命名空间Halide;
void _copy(uint8_t * in_ptr,uint8_t * out_ptr,const int M,const int N){
Image< uint8_t> in(Buffer(UInt(8),N,M,0,0,in_ptr));
图片< uint8_t> out(Buffer(UInt(8),N,M,0,0,out_ptr));
Var x,y;
Func copy;
copy(x,y)= in(x,y);
copy.realize (out);
}
int main(void){
uint8_t in [10000],out [10000];
_copy(in,out,100,100);
}
编译标志
clang ++ -O3 -march = native -std = c ++ 11 -Iinclude -Lbin -lHalide copy.cpp
解决方案让我从第二个问题开始:
_copy
需要很长时间,因为它需要编译Halide代码到x86机器代码。 IIRC,Func
缓存机器码,但由于copy
是本地的_copy
那个缓存不能重复使用。无论如何,调度copy
很简单,因为它是一个点序操作:首先,它可能是有意义的向量化它。第二,它可能有意义的并行化(取决于有多少数据)。例如:
copy.vectorize(x,32).parallel(y);
将向量化
x
,向量大小为32,并沿y
。 (我从内存中制作,可能会有一些关于正确名称的混乱。)当然,这样做也可能会增加编译时间...
是没有好的调度的食谱。我通过查看
compile_to_lowered_stmt
的输出并对代码进行分析。我也使用由Halide :: Generator
提供的AOT编译,这确保我只测量代码的运行时而不是编译时间。
你的另一个问题是,如何将现有数组包装在
Halide :: Image
中。我不这样做,主要是因为我使用AOT编译。但是,内部Halide使用一个名为buffer_t
的类型来处理所有图像相关。还有一个叫做Halide :: Buffer
的C ++包装,使得使用buffer_t
更容易一些,我认为它也可以用于Func :: implements
而不是Halide :: Image
。关键是:如果你理解buffer_t
,你可以把几乎所有东西都包装成Halide可以消化的东西。I'm getting started with Halide, and whilst I've grasped the basic tenets of its design, I'm struggling with the particulars (read: magic) required to efficiently schedule computations.
I've posted below a MWE of using Halide to copy an array from one location to another. I had assumed this would compile down to only a handful of instructions and take less than a microsecond to run. Instead, it produces 4000 lines of assembly and takes 40ms to run! Clearly, therefore, I have a significant hole in my understanding.
- What is the canonical way of wrapping an existing array in a
Halide::Image
? - How should the function
copy
be scheduled to perform the copy efficiently?
Minimal working example
#include <Halide.h> using namespace Halide; void _copy(uint8_t* in_ptr, uint8_t* out_ptr, const int M, const int N) { Image<uint8_t> in(Buffer(UInt(8), N, M, 0, 0, in_ptr)); Image<uint8_t> out(Buffer(UInt(8), N, M, 0, 0, out_ptr)); Var x,y; Func copy; copy(x,y) = in(x,y); copy.realize(out); } int main(void) { uint8_t in[10000], out[10000]; _copy(in, out, 100, 100); }
Compilation Flags
clang++ -O3 -march=native -std=c++11 -Iinclude -Lbin -lHalide copy.cpp
解决方案Let me start with your second question:
_copy
takes a long time, because it needs to compile Halide code to x86 machine code. IIRC,Func
caches the machine code, but sincecopy
is local to_copy
that cache cannot be reused. Anyways, schedulingcopy
is pretty simple because it's a pointwise operation: First, it would probably make sense to vectorize it. Second, it might make sense to parallelize it (depending on how much data there is). For example:copy.vectorize(x, 32).parallel(y);
will vectorize along
x
with a vector size of 32 and parallelize alongy
. (I am making this up from memory, there might be some confusion about the correct names.) Of course, doing all this might also increase compile times...There is no recipe for good scheduling. I do it by looking at the output of
compile_to_lowered_stmt
and profiling the code. I also use the AOT compilation provided byHalide::Generator
, this makes sure that I only measure the runtime of the code and not the compile time.Your other question was, how to wrap an existing array in a
Halide::Image
. I don't do that, mostly because I use AOT compilation. However, internally Halide uses a type calledbuffer_t
for everything image related. There is also C++ wrapper calledHalide::Buffer
that makes usingbuffer_t
a little easier, I think it can also be used inFunc::realize
instead ofHalide::Image
. The point is: If you understandbuffer_t
you can wrap almost everything into something digestible by Halide.这篇关于C ++数组Halide图像(和背面)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
- What is the canonical way of wrapping an existing array in a