使非constexpr整数值适应非类型模板参数,并使代码膨胀 [英] adapting a non-constexpr integral value to a non-type template parameter, and code bloat

查看:107
本文介绍了使非constexpr整数值适应非类型模板参数,并使代码膨胀的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑带有constexpr size_t自变量I

的函数对象F

struct F
{
    template <size_t I>
    constexpr size_t operator()(size <I>) const { return I; }
};

包装在类型size <I>中,其中(为简洁起见)

template <size_t N>
using size = std::integral_constant <size_t, N>;

当然,我们可以直接传递I,但我想强调的是,将它用作模板参数是constexpr.函数F在这里是虚拟的,但实际上它可以做各种有用的事情,例如从元组的第I个元素中检索信息.假定F具有相同的返回类型,而与I无关. I可以是任何整数类型,但假定为非负数.

问题

给出constexpr size_tI,我们可以通过

struct F
{
    template <size_t I>
    constexpr size_t operator()(size <I>) const { return I; }
};

调用F

F()(size <I>());

现在,如果我们要使用非constepr size_ti调用F怎么办?请考虑以下内容:

constexpr size_t L = 10;
idx <F, L> f;
for (size_t i = 0; i < L; ++i)
    cout << f(i) << " ";

(为什么需要这样做?为了提供一些背景信息,我实际上是在尝试将复合迭代器构建到一个容器视图中,该容器视图表示一系列连接的"(连接的)异构容器.这将给可以说类似join(a, b) = c;的东西,其中数组join(a, b)c的长度相等.但是,i是迭代器状态,因此不能为constexpr,但是子迭代器存储在元组中,需要通过constexpr索引访问.各个value_type的位置大致一致,因此联接视图可以采用其common_type类型,但是子容器和因此子迭代器的类型不同.)

解决方案

在这里,我想出了结构idx <F, L>,为此目的,它会适应函数F,假设输入参数小于L.实际上,这可以编译出良好的输出结果

0 1 2 3 4 5 6 7 8 9 

,这是一个在线示例.

idx的工作方式是将输入i递归分解为二进制表示形式,并重建constexpr副本N:

template <typename F, size_t L, size_t N = 0, bool = (N < L)>
struct idx
{
    template <size_t R = 1>
    inline constexpr decltype(F()(size <0>()))
    operator()(size_t I, size <R> = size <R>()) const
    {
        return I%2 ? idx <F, L, N+R>()(--I/2, size <2*R>()) :
               I   ? idx <F, L, N  >()(  I/2, size <2*R>()) :
               F()(size <N>());
    }
};

其中,R表示当前迭代中的2的幂.为了避免无限的模板实例化,对N >= L进行了专门化处理,将F()(size <0>())作为虚拟值返回:

template <typename F, size_t L, size_t N>
struct idx <F, L, N, false>
{
    template <size_t R>
    inline constexpr decltype(F()(size <0>()))
    operator()(size_t I, size <R>) const { return F()(size <0>()); }
};

实际上,此方法是对带有布尔参数的更常见用语的概括:

bool b = true;
b ? f <true>() : f <false>();

其中,f是将bool作为模板参数的函数.在这种情况下,很明显,f的所有两个可能版本都已实例化.

问题

尽管这可行并且在i中它的运行时复杂度可能是对数的,但我担心编译时的影响,例如:

  • 实例化了idx及其template operator()的多少组合,以使其在运行时对编译时未知的任何输入i起作用? (我再次理解所有可能",但是有多少?)

  • 真的可以内联operator()吗?

  • 是否有更容易"编译的替代策略或变体?

  • 我应该把这个想法当作纯粹的代码膨胀的实例吗?

注释

以下是我针对L的不同值测量的编译时间(以秒为单位)和可执行文件大小(以KB为单位):

 L      Clang(s)    GCC(s)    Clang(KB)    GCC(KB)
 10       1.3       1.7          33         36
 20       2.1       3.0          48         65
 40       3.7       5.8          87        120
 80       7.0      11.2         160        222
160      13.9      23.4         306        430
320      28.1      47.8         578        850
640      56.5     103.0        1126       1753

因此,尽管在L中看起来大致呈线性,但它又长又令人沮丧地很大.

尝试强制内联operator()失败:可能被Clang忽略(可执行更大),而GCC报告recursive inlining.

在可执行文件上运行nm -C对于L = 160,显示511/1253不同版本的operator()(带有Clang/GCC).这些都是针对N < L的,因此似乎终止专业化N >= L确实内联了.

PS .我会添加标签code-bloat,但是系统不允许我这样做.

解决方案

我将此技术称为魔术开关.

我知道最有效的方法是建立自己的跳转表.

// first, index list boilerplate.  Does log-depth creation as well
// needed for >1000 magic switches:
template<unsigned...Is> struct indexes {typedef indexes<Is...> type;};
template<class lhs, class rhs> struct concat_indexes;
template<unsigned...Is, unsigned...Ys> struct concat_indexes<indexes<Is...>, indexes<Ys...>>{
    typedef indexes<Is...,Ys...> type;
};
template<class lhs, class rhs>
using ConcatIndexes = typename concat_indexes<lhs, rhs>::type;

template<unsigned min, unsigned count> struct make_indexes:
    ConcatIndexes<
        typename make_indexes<min, count/2>::type,
        typename make_indexes<min+count/2, (count+1)/2>::type
    >
{};
template<unsigned min> struct make_indexes<min, 0>:
    indexes<>
{};
template<unsigned min> struct make_indexes<min, 1>:
    indexes<min>
{};
template<unsigned max, unsigned min=0>
using MakeIndexes = typename make_indexes<min, max-min>::type;

// This class exists simply because [](blah){code}... `...` expansion
// support is lacking in many compilers:
template< typename L, typename R, unsigned I >
struct helper_helper {
    static R func( L&& l ) { return std::forward<L>(l)(size<I>()); }
};
// the workhorse.  Creates an "manual" jump table, then invokes it:
template<typename L, unsigned... Is>
auto
dispatch_helper(indexes<Is...>, L&& l, unsigned i)
-> decltype( std::declval<L>()(size<0>()) )
{
  // R is return type:
  typedef decltype( std::declval<L>()(size<0>()) ) R;
  // F is the function pointer type for our jump table:
  typedef R(*F)(L&&);
  // the jump table:
  static const F table[] = {
    helper_helper<L, R, Is>::func...
  };
  // invoke at the jump spot:
  return table[i]( std::forward<L>(l) );
};
// create an index pack for each index, and invoke the helper to
// create the jump table:
template<unsigned Max, typename L>
auto
dispatch(L&& l, unsigned i)
-> decltype( std::declval<L>()(size<0>()) )
{
  return dispatch_helper( MakeIndexes<Max>(), std::forward<L>(l), i );
};

这需要一些静态设置,但是运行时非常快.

断言i在范围之内也可能有用.

实时示例

Consider a function object F taking a constexpr size_t argument I

struct F
{
    template <size_t I>
    constexpr size_t operator()(size <I>) const { return I; }
};

wrapped within a type size <I>, where (for brevity)

template <size_t N>
using size = std::integral_constant <size_t, N>;

Of course, we could pass I directly but I want to emphasize that it is constexpr by using it as a template argument. Function F is dummy here but in reality it could do a variety of useful stuff like retrieving information from the Ith element of a tuple. F is assumed to have the same return type regardless of I. I could be of any integral type but is assumed non-negative.

Problem

Given a constexpr size_t value I, we can call F by

F()(size <I>());

Now, what if we want to call F with a non-constepr size_t value i? Consider the following:

constexpr size_t L = 10;
idx <F, L> f;
for (size_t i = 0; i < L; ++i)
    cout << f(i) << " ";

(Why would I need this? To give some context, I am in fact trying to build a composite iterator into a container view that represents a sequence of "joined" (concatenated) heterogeneous containers. This would give the ability to say something like join(a, b) = c; where arrays join(a, b) and c are of equal length. However, i is iterator state so cannot be constexpr, yet sub-iterators are stored in a tuple and need to be accessed by a constexpr index. Individual value_type's are roughly consistent so that the joined view can take on their common_type type, but sub-containers and consequently sub-iterators are of different types.)

Solution

Here, I have come up with struct idx <F, L>, which adapts function F for this purpose, assuming the input argument is less than L. This actually compiles fine giving output

0 1 2 3 4 5 6 7 8 9 

and here is a live example.

idx works by recursively decomposing input i into a binary representation and reconstructing a constexpr counterpart N:

template <typename F, size_t L, size_t N = 0, bool = (N < L)>
struct idx
{
    template <size_t R = 1>
    inline constexpr decltype(F()(size <0>()))
    operator()(size_t I, size <R> = size <R>()) const
    {
        return I%2 ? idx <F, L, N+R>()(--I/2, size <2*R>()) :
               I   ? idx <F, L, N  >()(  I/2, size <2*R>()) :
               F()(size <N>());
    }
};

where R represents a power of 2 at the current iteration. To avoid infinite template instantiation, a specialization is given for N >= L, returning F()(size <0>()) as a dummy value:

template <typename F, size_t L, size_t N>
struct idx <F, L, N, false>
{
    template <size_t R>
    inline constexpr decltype(F()(size <0>()))
    operator()(size_t I, size <R>) const { return F()(size <0>()); }
};

In fact, this method is a generalization of the more common idiom with a Boolean argument:

bool b = true;
b ? f <true>() : f <false>();

where f is a function taking a bool as a template argument. In this case it is evident that all two possible versions of f are instantiated.

Question

Although this works and its run-time complexity is presumably logarithmic in i, I am concerned with the compile-time implications, like:

  • how many combinations of idx and its template operator() are instantiated for this to work at run time for any input i that is not known at compile time? (I understand "all possible" again, but how many?)

  • is it really possible to inline operator()?

  • is there any alternative strategy or variant that is "easier" to compile?

  • should I forget about this idea as an instance of pure code bloat?

Notes

Here are the compile times (in seconds) and executable sizes (in KB) I have measured for different values of L:

 L      Clang(s)    GCC(s)    Clang(KB)    GCC(KB)
 10       1.3       1.7          33         36
 20       2.1       3.0          48         65
 40       3.7       5.8          87        120
 80       7.0      11.2         160        222
160      13.9      23.4         306        430
320      28.1      47.8         578        850
640      56.5     103.0        1126       1753

So, although it appears roughly linear in L, it is quite long and frustratingly large.

Attempting to force operator() inline fails: probably ignored by Clang (executable even larger), while GCC reports recursive inlining.

Running nm -C on the executable e.g. for L = 160, shows 511/1253 different versions of operator() (with Clang/GCC). These are all for N < L so it appears the terminating specialization N >= L does get inlined.

PS. I would add tag code-bloat but the system won't let me.

解决方案

I call this technique the magic switch.

The most efficient way I know of to do this is to build your own jump table.

// first, index list boilerplate.  Does log-depth creation as well
// needed for >1000 magic switches:
template<unsigned...Is> struct indexes {typedef indexes<Is...> type;};
template<class lhs, class rhs> struct concat_indexes;
template<unsigned...Is, unsigned...Ys> struct concat_indexes<indexes<Is...>, indexes<Ys...>>{
    typedef indexes<Is...,Ys...> type;
};
template<class lhs, class rhs>
using ConcatIndexes = typename concat_indexes<lhs, rhs>::type;

template<unsigned min, unsigned count> struct make_indexes:
    ConcatIndexes<
        typename make_indexes<min, count/2>::type,
        typename make_indexes<min+count/2, (count+1)/2>::type
    >
{};
template<unsigned min> struct make_indexes<min, 0>:
    indexes<>
{};
template<unsigned min> struct make_indexes<min, 1>:
    indexes<min>
{};
template<unsigned max, unsigned min=0>
using MakeIndexes = typename make_indexes<min, max-min>::type;

// This class exists simply because [](blah){code}... `...` expansion
// support is lacking in many compilers:
template< typename L, typename R, unsigned I >
struct helper_helper {
    static R func( L&& l ) { return std::forward<L>(l)(size<I>()); }
};
// the workhorse.  Creates an "manual" jump table, then invokes it:
template<typename L, unsigned... Is>
auto
dispatch_helper(indexes<Is...>, L&& l, unsigned i)
-> decltype( std::declval<L>()(size<0>()) )
{
  // R is return type:
  typedef decltype( std::declval<L>()(size<0>()) ) R;
  // F is the function pointer type for our jump table:
  typedef R(*F)(L&&);
  // the jump table:
  static const F table[] = {
    helper_helper<L, R, Is>::func...
  };
  // invoke at the jump spot:
  return table[i]( std::forward<L>(l) );
};
// create an index pack for each index, and invoke the helper to
// create the jump table:
template<unsigned Max, typename L>
auto
dispatch(L&& l, unsigned i)
-> decltype( std::declval<L>()(size<0>()) )
{
  return dispatch_helper( MakeIndexes<Max>(), std::forward<L>(l), i );
};

which requires some static setup, but is pretty fast when run.

An assert that i is in bounds may also be useful.

live example

这篇关于使非constexpr整数值适应非类型模板参数,并使代码膨胀的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆