位摆弄黑客:最有效的方式来删除一个比特的每n位? [英] Bits twiddling hack: most efficient way to remove one bit every n bits?

查看:201
本文介绍了位摆弄黑客:最有效的方式来删除一个比特的每n位?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

下面是我的问题:

我需要做的是非常有效的(我需要做此操作数十亿倍的超级计算机)的 C C ++ 11 N N 被称为在编译时(模板参数)。什么是最有效的算法来做到这一点?

下面是一个例子:

 的#include<的iostream>
#包括< climits>
#包括< type_traits>
#包括<位集合>

模板<无符号整型模,
          typename的类型,
          unsigned int的大小= sizeof的(类型)* CHAR_BIT,
          类=类型名的std :: enable_if<的std :: is_integral<类型> ::值
                                       &功放;&安培;的std :: is_unsigned<类型> ::值GT; ::类型>
内联F型(输入x)
{
    //最没有效率的算法不断
    的std :: bitset的<大小> BX(X);
    的std :: bitset的<大小>由(0);
    unsigned int类型J = 0;
    为(unsigned int类型I = 0; I<大小; ++ I){
        如果(I%模){
            按[J ++] = BX [I]
        }
    }
    返回by.to_ullong();
}

诠释的main()
{
    的std :: bitset的< 64> X = 823934823;
    性病::法院<&其中,X<<的std :: ENDL;
    性病::法院<<(STD :: bitset的< 64>(F&2>(x.to_ullong())))<<的std :: ENDL;
    返回0;
}
 

解决方案

语义第一...

语义(和概念,因为你不能实际使用迭代器在这里),你正在做一个的std :: copy_if 在您的输入和输出范围是的std :: bitset的n种> 和你的predicate的形式是(用C ++ 14的lambda通用符号)的lambda

  [](自动ELEM){返回ELEM%N!= 0; }
 

该算法具有 O(N)的作业和你的predicate调用数数量的复杂性。因为的std :: bitset的n种> 没有迭代器,你有一点要检查一下。这意味着,你的循环使用手写predicate正在做同样的计算为的std :: copy_if 在一个假设的迭代的std ::位集合< N>

这意味着据渐近效率而言,你的算法不应该被视为低效

...优化最后

因此​​,考虑你的算法没有做任何事情一样糟糕二次复杂性的结论,可在其常数因子进行优化?效率的主要源的std :: bitset的来自该您的硬件可以并行处理很多(8,16,32或64)位<事实/ STRONG>。如果你有机会访问的执行情况,你可以写你自己的 copy_if 这需要的并行性,如优势通过特殊的硬件指令,查找表,或者一些 位变换算法

例如。这是怎样的成员函数计数(),以及海湾合作委员会和SGI扩展 Find_first _() Find_next _()的实施。旧的SGI实现使用的256项查找表来处理比特数和准迭代每一个8位的位字符。最新的gcc版本使用 __ builtin_popcountll() __ builtin_ctzll()做的人口数量和位查找每个64位字。

不幸的是,的std :: bitset的不公开其底层的无符号整数数组。所以,如果你想提高你贴的算法,则需要(通过调整自己的标准库的来源可能)来编写自己的位集合类模板,并给它一个成员函数 copy_if (或类似),它的硬件优势。它可以提供8倍的效率提高到64相比,你现在的算法。

Here is my question:

I need to do that very efficiently (I will need to do this operation several billion times on supercomputers) in C or C++11. N and n are known at compile-time (template parameters). What is the most efficient algorithm to do that ?

Here is an example:

#include <iostream>
#include <climits>
#include <type_traits>
#include <bitset>

template <unsigned int Modulo,
          typename Type,
          unsigned int Size = sizeof(Type)*CHAR_BIT,
          class = typename std::enable_if<std::is_integral<Type>::value
                                       && std::is_unsigned<Type>::value>::type>
inline Type f(Type x)
{
    // The most inefficient algorithm ever
    std::bitset<Size> bx(x);
    std::bitset<Size> by(0);
    unsigned int j = 0;
    for (unsigned int i = 0; i < Size; ++i) {
        if (i%Modulo) {
            by[j++] = bx[i];
        }
    }
    return by.to_ullong();
}

int main()
{
    std::bitset<64> x = 823934823;
    std::cout<<x<<std::endl;
    std::cout<<(std::bitset<64>(f<2>(x.to_ullong())))<<std::endl;
    return 0;
}

解决方案

Semantics first...

Semantically (and conceptually, because you can't actually use iterators here), you are doing a std::copy_if where your input and output ranges are a std::bitset<N> and your predicate is a lambda of the form (using C++14 generic lambda notation)

[](auto elem) { return elem % n != 0; }

This algorithm has O(N) complexity in the number of assignments and number of invocations of your predicate. Because std::bitset<N> doesn't have iterators, you have to check bit by bit. This means that your loop with a handwritten predicate is doing the exact same computation as a std::copy_if over a hypothetical iterable std::bitset<N>.

This means that as far as asympotic efficiency is concerned, your algorithm should not be considered as inefficient.

...optimization last

So given the conclusion that your algorithm isn't doing anything as bad as quadratic complexity, can its constant factor be optimized? The main source of efficiency of a std::bitset comes from the fact that your hardware can handle many (8, 16, 32 or 64) bits in parallel. If you had access to the implementation, you could write your own copy_if that takes advantage of that parallelism, e.g. by special hardware instructions, lookup tables, or some bit-twiddling algorithm.

E.g. this is how the member function count(), as well as the gcc and SGI extensions Find_first_() and Find_next_() are implemented. The old SGI implementation uses lookup tables of 256 entries to handle bit count and quasi-iteration over the bits of each 8-bit char. The latest gcc version uses __builtin_popcountll() and __builtin_ctzll() to do population count and bit lookup for each 64-bit word.

Unfortunately, std::bitset does not expose its underlying array of unsigned integers. So if you want to improve your posted algorithm, you need to write your own BitSet class template (possible by adapting the source of your own Standard Library) and give it a member function copy_if (or similar) that takes advantage of your hardware. It can give efficiency gains of a factor of 8 to 64 compared to your current algorithm.

这篇关于位摆弄黑客:最有效的方式来删除一个比特的每n位?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆