如何显式地从arrayfire获取线性索引? [英] How to explicitly get linear indices from arrayfire?

查看:105
本文介绍了如何显式地从arrayfire获取线性索引?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个stl::array<float, 24> foo,它是Column-Major格式arrayfire数组的线性化STL垂饰,例如af::array bar = af::array(4,3,2, 1, f32);.因此,我有一个尺寸为baraf::dim4对象dims,我最多有4个af::seq对象,并且具有线性化数组foo.

Suppose I have an stl::array<float, 24> foo which is the linearized STL pendant to a Column-Major format arrayfire array, e.g. af::array bar = af::array(4,3,2, 1, f32);. So I have an af::dim4 object dims with the dimensions of bar, I have up to 4 af::seq-objects and I have the linearized array foo.

如何显式获取表示例如以下内容的foo的索引(即bar的线性化版本).第二和第三行,即bar(af::seq(1,2), af::span, af::span, af::span)?我在下面给出了一个小的代码示例,该示例显示了我想要的.最后,我还要解释为什么要这么做.

How is it possible to get explicitly the indices of foo (i.e. linearized version of bar) representing e.g. the 2.nd and 3.rd row, i.e. bar(af::seq(1,2), af::span, af::span, af::span)? I have a small code example given below, which shows what I want. In the end I also explain why I want this.

af::dim4 bigDims = af::dim4(4,3,2);
stl::array<float, 24> foo;   // Resides in RAM and is big
float* selBuffer_ptr;        // Necessary for AF correct type autodetection
stl::vector<float> selBuffer;
// Load some data into foo
af::array selection;         // Resides in VRAM and is small

af::seq selRows = af::seq(1,2);
af::seq selCols = af::seq(bigDims[1]);   // Emulates af::span
af::seq selSlices = af::seq(bigDims[2]); // Emulates af::span
af::dim4 selDims = af::dim4(selRows.size, selCols.size, selSlices.size);    

dim_t* linIndices;
// Magic functionality getting linear indices of the selection
//  selRows x selCols x selSlices

// Assign all indexed elements to a consecutive memory region in selBuffer
// I know their positions within the full dataset, b/c I know the selection ranges.

selBuffer_ptr = static_cast<float> &(selBuffer[0]);

selection = af::array(selDims, selBuffer_ptr);      // Copies just the selection to the device (e.g. GPU)

// Do sth. with selection and be happy
// I don't need to write back into the foo array.

Arrayfire必须实现这样的逻辑才能访问元素,我发现了几个相关的类/函数,例如af::index, af::seqToDims, af::gen_indexing, af::array::operator()-但是我还没有找到简单的出路.

Arrayfire must have such a logic implemented in order to access elements and I found several related classes/functions such as af::index, af::seqToDims, af::gen_indexing, af::array::operator() - however I couldn't figure an easy way out yet.

我考虑了基本上重新实现operator()的方法,以便它可以类似地工作,但不需要引用数组对象.但是,如果在arrayfire-framework中有一种简单的方法,则可能会浪费精力.

I thought about basically reimplementing the operator(), so that it would work similarly but not require a reference to an array-object. But this might be wasted effort if there is an easy way in the arrayfire-framework.

背景: 我想要这样做的原因是因为arrayfire在与GPU后端链接时不允许仅在主内存(CPU上下文)中存储数据.由于我需要处理的数据量很大,而且VRAM非常有限,因此我想从始终驻留在主内存中的stl容器实例化af::array -objects ad-hoc.

Background: The reason I want to do so is because arrayfire does not allow to store data only in main memory (CPU-context) while being linked against a GPU backend. Since I have a big chunk of data that needs to be processed only piece by piece and the VRAM is quite limited, I'd like to instantiate af::array-objects ad-hoc from an stl-container which always resided in main memory.

当然,我知道我可以编写一些索引魔术来解决问题,但是我想使用相当复杂的af::seq对象,这可能会使索引逻辑的有效实现变得复杂.

Of course I know that I could program some index magic to work around my problem but I'd like to use quite complicated af::seq objects which could make an efficient implementation of the index logic complicated.

推荐答案

与Pavan Yalamanchili讨论了Gitter之后,我设法获得了一个我想共享的代码段,以防万一其他人只需要保留他的变量. RAM并将其使用时复制的部分复制到VRAM,即Arrayfire Universe(如果与GPU或Nvidia上的OpenCL链接).

After a discussion with Pavan Yalamanchili on Gitter I managed to get a working piece of code that I want to share in case anybody else needs to hold his variables only in RAM and copy-on-use parts of it to VRAM, i.e. the Arrayfire universe (if linked against OpenCL on GPU or Nvidia).

该解决方案还将帮助在项目中其他任何地方都使用AF的任何人,并希望有一种便捷的方法来访问具有(N< = 4)的大型线性N-dim数组的人.

This solution will also help anybody who is using AF somewhere else in his project anyways and who wants to have a convenient way of accessing a big linearized N-dim array with (N<=4).

//  Compile as: g++ -lafopencl malloc2.cpp && ./a.out
#include <stdio.h>
#include <arrayfire.h>
#include <af/util.h>

#include <cstdlib>
#include <iostream>

#define M 3
#define N 12
#define O 2
#define SIZE M*N*O


int main() {
    int _foo;                      // Dummy variable for pausing program
    double* a = new double[SIZE];  // Allocate double array on CPU (Big Dataset!)
    for(long i = 0; i < SIZE; i++) // Fill with entry numbers for easy debugging
        a[i] = 1. * i + 1;

    std::cin >> _foo; // Pause 

    std::cout << "Full array: ";
    // Display full array, out of convenience from GPU
    // Don't use this if "a" is really big, otherwise you'll still copy all the data to the VRAM.
    af::array ar = af::array(M, N, O, a);   // Copy a RAM -> VRAM


    af_print(ar);

    std::cin >> _foo; // Pause 


    // Select a subset of the full array in terms of af::seq
    af::seq seq0 = af::seq(1,2,1);     // Row 2-3
    af::seq seq1 = af::seq(2,6,2);     // Col 3:5:7
    af::seq seq2 = af::seq(1,1,1);     // Slice 2


    // BEGIN -- Getting linear indices
    af::array aidx0 = af::array(seq0);
    af::array aidx1 = af::array(seq1).T() * M;
    af::array aidx2 = af::reorder(af::array(seq2), 1, 2, 0) * M * N;

    af::gforSet(true);
    af::array aglobal_idx = aidx0 + aidx1 + aidx2;
    af::gforSet(false);

    aglobal_idx = af::flat(aglobal_idx).as(u64);
    // END -- Getting linear indices

    // Copy index list VRAM -> RAM (for easier/faster access)
    uintl* global_idx = new uintl[aglobal_idx.dims(0)];
    aglobal_idx.host(global_idx);

    // Copy all indices into a new RAM array
    double* a_sub = new double[aglobal_idx.dims(0)];
    for(long i = 0; i < aglobal_idx.dims(0); i++)
        a_sub[i] = a[global_idx[i]];

    // Generate the "subset" array on GPU & diplay nicely formatted
    af::array ar_sub = af::array(seq0.size, seq1.size, seq2.size, a_sub);
    std::cout << "Subset array: ";  // living on seq0 x seq1 x seq2
    af_print(ar_sub);

    return 0;
}

/*
g++ -lafopencl malloc2.cpp && ./a.out 

Full array: ar
[3 12 2 1]
    1.0000     4.0000     7.0000    10.0000    13.0000    16.0000    19.0000    22.0000    25.0000    28.0000    31.0000    34.0000 
    2.0000     5.0000     8.0000    11.0000    14.0000    17.0000    20.0000    23.0000    26.0000    29.0000    32.0000    35.0000 
    3.0000     6.0000     9.0000    12.0000    15.0000    18.0000    21.0000    24.0000    27.0000    30.0000    33.0000    36.0000 

   37.0000    40.0000    43.0000    46.0000    49.0000    52.0000    55.0000    58.0000    61.0000    64.0000    67.0000    70.0000 
   38.0000    41.0000    44.0000    47.0000    50.0000    53.0000    56.0000    59.0000    62.0000    65.0000    68.0000    71.0000 
   39.0000    42.0000    45.0000    48.0000    51.0000    54.0000    57.0000    60.0000    63.0000    66.0000    69.0000    72.0000 

ar_sub
[2 3 1 1]
   44.0000    50.0000    56.0000 
   45.0000    51.0000    57.0000 
*/

该解决方案使用了一些未记录的AF函数,并且由于for循环运行在global_idx上而被认为是缓慢的,但是到目前为止,如果希望仅在CPU上下文中仅保存数据并仅与CPU共享部分,则它确实是最好的解决方案. AF的GPU上下文进行处理.

The solution uses some undocumented AF functions and is supposedly slow due to the for loop running over global_idx, but so far its really the best one can do if on wants to hold data in the CPU context exclusively and share only parts with the GPU context of AF for processing.

如果有人知道如何加快此代码的速度,我仍然愿意提出建议.

If anybody knows a way to speed this code up, I'm still open for suggestions.

这篇关于如何显式地从arrayfire获取线性索引?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆