二进制字符串到十六进制c ++ [英] Binary String to Hex c++

查看:248
本文介绍了二进制字符串到十六进制c ++的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当将二进制字符串更改为十六进制时,我只能根据我找到的答案,将其设置为特定大小。但我想把MASSIVE二进制字符串转换成完整的十六进制字符串,这是一个比这更有效的方式,这是我遇到的唯一的方法是完全:

When changing a Binary string to Hex, I could only do it to a certain size based off of the answers I found. But I want to change MASSIVE Binary strings into their complete Hex counterpart in a more efficient way than this which is the only way I've come across that does it completely:

for(size_t i = 0; i < (binarySubVec.size() - 1); i++){
    string binToHex, tmp = "0000";
    for (size_t j = 0; j < binaryVecStr[i].size(); j += 4){
        tmp = binaryVecStr[i].substr(j, 4);
        if      (!tmp.compare("0000")) binToHex += "0";
        else if (!tmp.compare("0001")) binToHex += "1";
        else if (!tmp.compare("0010")) binToHex += "2";
        else if (!tmp.compare("0011")) binToHex += "3";
        else if (!tmp.compare("0100")) binToHex += "4";
        else if (!tmp.compare("0101")) binToHex += "5";
        else if (!tmp.compare("0110")) binToHex += "6";
        else if (!tmp.compare("0111")) binToHex += "7";
        else if (!tmp.compare("1000")) binToHex += "8";
        else if (!tmp.compare("1001")) binToHex += "9";
        else if (!tmp.compare("1010")) binToHex += "A";
        else if (!tmp.compare("1011")) binToHex += "B";
        else if (!tmp.compare("1100")) binToHex += "C";
        else if (!tmp.compare("1101")) binToHex += "D";
        else if (!tmp.compare("1110")) binToHex += "E";
        else if (!tmp.compare("1111")) binToHex += "F";
        else continue;
    }
    hexOStr << binToHex;
    hexOStr << " ";
}

其彻底和绝对,但速度缓慢。

Its thorough and absolute, but slow.

有更简单的方法吗?

推荐答案

UPDATE 比较和基准测试。

这里是另一个基于完美的哈希。使用 gperf 生成完美的散列(如下所述:)。

Here's another take, based on a perfect hash. The perfect hash was generated using gperf (like described here: Is it possible to map string to int faster than using hashmap?).

我通过移动函数local statics并标记 hexdigit() hash() as constexpr 。这消除了不必要的任何初始化开销,并给予编译器完全优化的空间/

I've further optimized by moving function local statics out of the way and marking hexdigit() and hash() as constexpr. This removes unnecessary any initialization overhead and gives the compiler full room for optimization/

可以尝试阅读例如如果可能,一次只能有1024个半字节,并且使编译器有机会使用AVX / SSE指令集对操作进行向量化。

You could try reading e.g. 1024 nibbles at a time if possible, and give the compiler a chance to vectorize the operations using AVX/SSE instruction sets. (I have not inspected the generated code to see whether this would happen.)

完整的示例代码在流模式下的std :: cin std :: cout 是:

#include <iostream>

int main()
{
    char buffer[4096];
    while (std::cin.read(buffer, sizeof(buffer)), std::cin.gcount())
    {
        size_t got = std::cin.gcount();
        char* out = buffer;

        for (auto it = buffer; it < buffer+got; it += 4)
            *out++ = Perfect_Hash::hexchar(it);

        std::cout.write(buffer, got/4);
    }
}



这里是 Perfect_Hash 类,稍微修改并用 hexchar 查找扩展。注意,它使用 assert 验证 DEBUG 生成中的输入:

Here's the Perfect_Hash class, slightly redacted and extended with the hexchar lookup. Note that it does validate input in DEBUG builds using the assert:

Live On Coliru

#include <array>
#include <algorithm>
#include <cassert>

class Perfect_Hash {
    /* C++ code produced by gperf version 3.0.4 */
    /* Command-line: gperf -L C++ -7 -C -E -m 100 table  */
    /* Computed positions: -k'1-4' */

    /* maximum key range = 16, duplicates = 0 */
  private:
      static constexpr unsigned char asso_values[] = {
          27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
          27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 15, 7,  3,  1,  0,  27,
          27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
          27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
          27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27};
      template <typename It>
      static constexpr unsigned int hash(It str)
      {
          return 
              asso_values[(unsigned char)str[3] + 2] + asso_values[(unsigned char)str[2] + 1] +
              asso_values[(unsigned char)str[1] + 3] + asso_values[(unsigned char)str[0]];
      }

      static constexpr char hex_lut[] = "???????????fbead9c873625140";
  public:
#ifdef DEBUG
    template <typename It>
    static char hexchar(It binary_nibble)
    {
        assert(Perfect_Hash::validate(binary_nibble)); // for DEBUG only
        return hex_lut[hash(binary_nibble)]; // no validation!
    }
#else
    template <typename It>
    static constexpr char hexchar(It binary_nibble)
    {
        return hex_lut[hash(binary_nibble)]; // no validation!
    }
#endif
    template <typename It>
    static bool validate(It str)
    {
        static constexpr std::array<char, 4> vocab[] = {
            {{'?', '?', '?', '?'}}, {{'?', '?', '?', '?'}}, {{'?', '?', '?', '?'}},
            {{'?', '?', '?', '?'}}, {{'?', '?', '?', '?'}}, {{'?', '?', '?', '?'}},
            {{'?', '?', '?', '?'}}, {{'?', '?', '?', '?'}}, {{'?', '?', '?', '?'}},
            {{'?', '?', '?', '?'}}, {{'?', '?', '?', '?'}},
            {{'1', '1', '1', '1'}}, {{'1', '0', '1', '1'}},
            {{'1', '1', '1', '0'}}, {{'1', '0', '1', '0'}},
            {{'1', '1', '0', '1'}}, {{'1', '0', '0', '1'}},
            {{'1', '1', '0', '0'}}, {{'1', '0', '0', '0'}},
            {{'0', '1', '1', '1'}}, {{'0', '0', '1', '1'}},
            {{'0', '1', '1', '0'}}, {{'0', '0', '1', '0'}},
            {{'0', '1', '0', '1'}}, {{'0', '0', '0', '1'}},
            {{'0', '1', '0', '0'}}, {{'0', '0', '0', '0'}},
        }; 
        int key = hash(str);

        if (key <= 26 && key >= 0)
            return std::equal(str, str+4, vocab[key].begin());
        else
            return false;
    }
};

constexpr unsigned char Perfect_Hash::asso_values[];
constexpr char Perfect_Hash::hex_lut[];

#include <iostream>

int main()
{
    char buffer[4096];
    while (std::cin.read(buffer, sizeof(buffer)), std::cin.gcount())
    {
        size_t got = std::cin.gcount();
        char* out = buffer;

        for (auto it = buffer; it < buffer+got; it += 4)
            *out++ = Perfect_Hash::hexchar(it);

        std::cout.write(buffer, got/4);
    }
}

od -A none -t o / dev / urandom | tr -cd'01'| dd bs = 1 count = 4096 | ./test


03bef5fb79c7da917e3ebffdd8c41488d2b841dac86572cf7672d22f1f727627a2c4a48b15ef27eb0854dd99756b24c678e3b50022d695cc5f5c8aefaced2a39241bfd5deedcfa0a89060598c6b056d934719eba9ccf29e430d2def5751640ff17860dcb287df8a94089ade0283ee3d76b9fefcce3f3006b8c71399119423e780cef81e9752657e97c7629a9644be1e7c96b5d0324ab16d20902b55bb142c0451e675973489ae4891ec170663823f9c1c9b2a11fcb1c39452aff76120b21421069af337d14e89e48ee802b1cecd8d0886a9a0e90dea5437198d8d0d7ef59c46f9a069a83835286a9a8292d2d7adb4e7fb0ef42ad4734467063d181745aaa6694215af7430f95e854b7cad813efbbae0d2eb099523f215cff6d9c45e3edcaf63f78a485af8f2bfc2e27d46d61561b155d619450623b7aa8ca085c6eedfcc19209066033180d8ce1715e8ec9086a7c28df6e4202ee29705802f0c2872fbf06323366cf64ecfc5ea6f15ba6467730a8856a1c9ebf8cc188e889e783c50b85824803ed7d7505152b891cb2ac2d6f4d1329e100a2e3b2bdd50809b48f0024af1b5092b35779c863cd9c6b0b8e278f5bec966dd0e5c4756064cca010130acf24071d02de39ef8ba8bd1b6e9681066be3804d36ca83e7032274e4c8e8cacf520e8078f8fa80eb8e70af40367f53e53a7d7f7afe8704c46f58339d660b8151c91bddf82b4096



BENCHMARKS



我想出了三种不同的方法:

BENCHMARKS

I came up with three different approaches:


  1. naive.cpp(没有黑客,没有图书馆) ;在 Godbolt 实时解体
  2. =http://coliru.stacked-crooked.com/a/f9e7dbde1fb1b978 =nofollow> spirit.cpp (Trie); 现场反汇编关于pastebin
  3. 此答案: perfect.cpp hash ;在 Godbolt 上实时解除

  1. naive.cpp (no hacks, no libraries); Live disassembly on Godbolt
  2. spirit.cpp (Trie); Live disassembly on pastebin
  3. and this answer: perfect.cpp hash based; Live disassembly on Godbolt

为了做一些比较,我有


  • 使用相同的编译器4.9)和标志( -O3 -march = native -g0 -DNDEBUG

  • 优化的输入/输出,

结果如下:


  • 令人惊讶的是,第一个答案的 naive >
  • Spirit在这里真的很糟糕;它的网络是3.4MB / s,这样整个文件将花费294秒(!!!)。

  • naive.cpp 的平均吞吐量为〜720MB / s,完美的平均吞吐量为〜1.14GB / s。 cpp

  • 这使得完美的散列方法比天真方法快大约50%。

  • Surprisingly, the naive approach from the first answer does rather well
  • Spirit does really badly here; it nets 3.4MB/s so that the whole file would take at 294 seconds (!!!). We've left it off the charts
  • The average throughputs are ~720MB/s for naive.cpp and ~1.14GB/s forperfect.cpp
  • This makes the perfect hash approach roughly 50% faster than the naive approach.

*摘要我想说的天真的方法是非常好的因为我发布了它在10小时前whim。如果你真的想要高吞吐量,完美的哈希是一个很好的开始,但考虑手动滚动基于SIMD的解决方案

*Summary I'd say the naive approach was plenty good as I posted it on whim 10 hours ago. If you really want high throughput, the perfect hash is a nice start, but consider hand-rolling a SIMD based solution

这篇关于二进制字符串到十六进制c ++的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆