实例化所有模板参数组合的函数,在运行时选择实例化 [英] Instantiate function for all combinations of template params, choose instantation at run time

查看:254
本文介绍了实例化所有模板参数组合的函数,在运行时选择实例化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



我有一个模板化的CUDA内核,如下所示:

 模板< int firstTextureIndex,int secondTextureIndex,int thirdTextureIndex> __global__ void myKernel 

三个纹理索引模板类型的范围为0-7,运行。我需要实例化这个内核的所有512个组合,然后基于纹理索引的运行时值调用正确的模板。



我从来没有写过任何预处理宏我试图避免它。另一篇文章,此处,显示了如何实例化多个通过执行以下操作,递归调用一个模板变量的模板:

  template< int i& 
class loop {
loop< i-1> X;
}

模板<>
class loop< 1> {
}

loop< 10> l;

我努力将它扩展为3个变量和一个函数情况。即使我想出了如何实例化所有的方式,我怎么实际上调用512个可能性中的一个在运行时没有嵌套的switch语句?为了说明,我试图避免的嵌套switch语句将是:

  switch(firstTextureIndex)
{
case 0:
switch(secondTextureIndex)
{
case 1:
switch(thirdTextureIndex)
{
case 2:
myKernel< 0,1,2><< grid,block>>>(param1,param2,param3);
break;
}
break;
}
break;
}



如果我想出如何实例化0-7所有的,我叫它:

  myKernel< i,j,k><<< grid,block>>> ;(param1,param2)如果我使i,j和k枚举类型只包含0-7,那么

这样编译器可以知道所有可能的值,因为我实例化它们所有它会正常吗?



请注意,这个三元模板有很好的理由传递纹理索引,但我省略了简洁的解释。任何帮助实例化和/或调用这个内核将非常感激。



编辑:Jarod42提供了一个有效的解决方案,正是我所问。不幸的是我现在意识到c ++标准在这里很重要。我使用c ++ 98/03结合最新稳定版本的boost库,所以使用这些的解决方案将是理想的。我可以使用c ++ 11,但是c ++ 14是由于我们的编译器的限制。

解决方案

以下代码是 C ++ 98/03 boost .MPL 。有一定的改进空间(例如隐藏全局指针数组,检查非法组合,...)。



这个想法是递归运行所有的组合整数列表,从而填充每个组合的函数指针数组。



我使用一个类似,更复杂的代码,在运行时选择最佳的内核参数组合(自动调整),例如 launch_bounds 和其他选项: culgt / runtimechooser



以下是您的案例的简化版本

  #include< iostream> 

#include< boost / mpl / vector.hpp>
#include< boost / mpl / vector_c.hpp>
#include< boost / mpl / for_each.hpp>
#include< boost / mpl / push_back.hpp>
#include< boost / mpl / at.hpp>

命名空间mpl = boost :: mpl;

template< int index1,int index2,int index3> void execKernel()
{
std :: cout< Kernel called with<< index1<< /<< index2< /<< index3<< std :: endl;
}

typedef void(* FPTR)();
FPTR ptr [512];

struct NIL
{
public:
static const int value = 0;
};

template< typename Seq,typename T1,typename T2 = NIL> class MakeSequenceImpl
{
public:
template< typename T> void operator()(T)
{
typedef MakeSequenceImpl< typename mpl :: push_back< Seq,T& RunSeq;
mpl :: for_each< T1>(RunSeq());
}
};

template< typename Seq>类MakeSequenceImpl< Seq,NIL,NIL>
{
public:
template< typename T> void operator()(T)
{
typedef typename mpl :: push_back< Seq,T> :: type FinalSeq;

int index = mpl :: at< FinalSeq,mpl :: int_< 0> > :: type :: value * 64
+ mpl :: at< FinalSeq,mpl :: int_& > :: type :: value * 8
+ mpl :: at< FinalSeq,mpl :: int_& > :: type :: value;

ptr [index] = execKernel< mpl :: at< FinalSeq,mpl :: int_< 0> > :: type :: value,mpl :: at< FinalSeq,mpl :: int_& > :: type :: value,mpl :: at }
};


template< typename T0,typename T1,typename T2> class MakeSequence
{
public:
typedef mpl :: vector_c< int> Seq;

MakeSequence()
{
typedef MakeSequenceImpl< Seq,T1,T2> RunSeq;
mpl :: for_each< T0>(RunSeq());
}
};


void callWrapper(int i,int j,int k)
{
ptr [i * 64 + j * 8 + k]
}

typedef mpl :: vector_c< int,0,1,2,3,4,5,6,7> list1;
typedef mpl :: vector_c< int,0,1,2,3,4,5,6,7> list2;
typedef mpl :: vector_c< int,0,1,2,3,4,5,6,7> list3;

int main()
{
MakeSequence< list1,list2,list3>前端;

int i,j,k;

std :: cin>>一世;
std :: cin>> j;
std :: cin>> k;

callWrapper(i,j,k);
}


Sorry if this has been asked before, but I couldn't find this exact question.

I have a templated CUDA kernel that looks like this:

template<int firstTextureIndex, int secondTextureIndex, int thirdTextureIndex> __global__ void myKernel

The three texture index template types will range from 0-7 and will not be known until runtime. I need to instantiate all 512 combinations of this kernel and then call the correct template based on the runtime values of the texture indices.

I've never written any pre processing macros and am trying to avoid it. Another post, here, shows how to instantiate many class templates for one template variable recursively by doing this:

template<int i>
class loop {
    loop<i-1> x;
}

template<>
class loop<1> {
}

loop<10> l;

I'm struggling to extend that to 3 variables and a function (instead of a class) for my situation. Even if I figure out how to instantiate all of them that way, how do I actually call 1 out of 512 possibilities at runtime without nested switch statements? To illustrate, the nested switch statements I'm trying to avoid would be like:

switch(firstTextureIndex)
{
    case 0:
        switch(secondTextureIndex)
        {
            case 1:
                switch(thirdTextureIndex)
                {
                    case 2:
                        myKernel<0, 1, 2><<<grid, block>>>(param1, param2, param3);
                        break;
                }
             break;
        }
    break;
}

If I figure out how to instantiate 0-7 for all of them, could I call it like:

myKernel<i, j, k><<<grid, block>>>(param1, param2); 

if I make i, j, and k enum types containing only 0-7? This way the compiler could know all the possible values and since I instantiate them all it would be OK with it?

Please note that there are good reasons for this triple template to pass in texture indices, but I'm omitting the explanation for conciseness. Any help on instantiating and/or calling this kernel would be greatly appreciated.

Edit: Jarod42 provided a valid solution that does exactly what I asked. Unfortunately I now realize the c++ standard is important here. I'm using c++98/03 combined with the latest stable release of the boost library, so a solution using these would be ideal. I could potentially use c++11, but c++14 is out due to limitations of our compiler.

解决方案

The following code is an implementation for C++98/03 and boost.MPL. There is definitely room for improvement (for example hiding the global pointer array, checking for illegal combinations, ...).

The idea is to recursively run through all combinations of the integer lists and thereby fill an array of function pointers for each combination.

I used a similar, more complex code before for selecting at run time the best combination of kernel parameters (auto tuning) like launch_bounds and other options: culgt/runtimechooser.

Here is a simplified version for your case

#include <iostream>

#include <boost/mpl/vector.hpp>
#include <boost/mpl/vector_c.hpp>
#include <boost/mpl/for_each.hpp>
#include <boost/mpl/push_back.hpp>
#include <boost/mpl/at.hpp>

namespace mpl = boost::mpl;

template<int index1, int index2, int index3> void execKernel()
{
    std::cout << "Kernel called with " << index1 << "/" << index2 << "/" << index3 << std::endl;
}

typedef void (*FPTR)();
FPTR ptr[512];

struct NIL
{
public:
    static const int value = 0;
};

template<typename Seq, typename T1, typename T2 = NIL> class MakeSequenceImpl
{
public:
    template<typename T> void operator()(T)
    {
        typedef MakeSequenceImpl<typename mpl::push_back<Seq,T>::type,T2> RunSeq;
        mpl::for_each<T1>( RunSeq() );
    }
};

template<typename Seq> class MakeSequenceImpl<Seq, NIL, NIL>
{
public:
    template<typename T> void operator()(T)
    {
        typedef typename mpl::push_back<Seq,T>::type FinalSeq;

        int index = mpl::at<FinalSeq,mpl::int_<0> >::type::value * 64
                + mpl::at<FinalSeq,mpl::int_<1> >::type::value * 8
                + mpl::at<FinalSeq,mpl::int_<2> >::type::value;

        ptr[index] = execKernel<mpl::at<FinalSeq,mpl::int_<0> >::type::value, mpl::at<FinalSeq,mpl::int_<1> >::type::value, mpl::at<FinalSeq,mpl::int_<2> >::type::value>;
    }
};


template<typename T0, typename T1, typename T2> class MakeSequence
{
public:
    typedef mpl::vector_c<int> Seq;

    MakeSequence()
    {
        typedef MakeSequenceImpl<Seq, T1, T2> RunSeq;
        mpl::for_each<T0>( RunSeq() );
    }
};


void callWrapper( int i, int j, int k )
{
    ptr[i*64+j*8+k]();
}

typedef mpl::vector_c< int, 0, 1, 2, 3, 4, 5, 6, 7 > list1;
typedef mpl::vector_c< int, 0, 1, 2, 3, 4, 5, 6, 7 > list2;
typedef mpl::vector_c< int, 0, 1, 2, 3, 4, 5, 6, 7 > list3;

int main()
{
    MakeSequence<list1,list2,list3> frontend;

    int i,j,k;

    std::cin >> i;
    std::cin >> j;
    std::cin >> k;

    callWrapper(i,j,k);
}

这篇关于实例化所有模板参数组合的函数,在运行时选择实例化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆